Debugging Active directory integration with kerberos

User login to Automation Suite

When there are user login issues to Automation Suite even after following these steps, please follow the steps below to debug the issue.

  1. Check if Kerberos is configured successfully if you’ve configured Kerberos following this documention.
  • Go to ArgoCD, UiPath application, look for pod “services-preinstall-validations-job” and “kerberos-jobs-trigger". These two jobs run kinit to generate the tgt once you have Kerberos configured. If both these two pods are in successful state, then kerberos is configured correctly. Check on the logs if any one of this two pods failed.
  • Go to ArgoCD, UiPath application, look for pod “kerberos-tgt-update”. This is a cron job triggered to get fresh tgt before the existing ones expire. Check on the logs if this pod failed.
  1. Check the AS pod statuses in Argo CD. The applications should show up as synced and configured. If this is not true, open the ArgoCD, observe if the identity-service-api pods are in unhealthy state. If it is in unhealthy state, click on the pod, navigate to logs, choose identity-server, and inspect the logs.

  • If the logs show error not being able to reach the domain controller, the problem is at the network level. AS is trying to reach the domain specified in the configuration, and DNS that the AS is connecting to does not have the information about the domain. Please work with the company’s IT department to setup so that the AS cluster uses an official domain that is registered in the network’s DNS service.

  • If there is an error about LDAP connection due to an internal LDAP error (A local error occurred), there is a known issue when the customer had one configuration for AD integration (using Kerberos configuration), and then switches to using LDAPS or vice versa. To workaround this bug.

    a. Open the SQL database for AS
    b. Navigate to [identity].[DirectoryConnections] table and find the entry for “ldapad”. Set the IsDeleted value to True.
    c. Navigate to [identity].[ExternalIdentityProviders] table, find the entry for “ldapad”. Set the “IsActive” column to True
    d. Restart the identity pods
    e. Configure AD integration through the host security settings again.

  1. If the AS Pod statuses in ArgoCD show up as healthy, but the user is not able to authenticate using the browser to Automation suite
    Ensure the browser is running on a windows machine domain joined to the domain, or forest or a trusted forest to which the account representing the keytab belongs. Kerberos only works in such an environment. On MSI, windows authentication works with NTLM, but in Automation Suite, it does not.
  • If your cluster name resolves using a CNAME record in DNS, browsers resolve the CNAME record first before verifying the kerberos service name. If this applies to you, or you dont know how the DNS was created, and you use either chrome or edge browsers, set this setting: Disable CNAME lookup when negotiating Kerberos authentication (admx.help), restart the browser and try again. If this works, the DNS needs to be set using an AA record instead of a CNAME record, or this browser setting needs to be applied across the organization using Group Policy.
  • If there is still an error on the browser saying An unsupported mechanism was requested, or a repeated authentication prompt, it is time to debug through kerberos. On the windows machine where you are running the browser from, turn on kerberos logging using Enable Kerberos event logging - Windows Server | Microsoft Learn. First set the LogLevel value as specified in the doc, then retry the scenario.
    1. Start Event Viewer, and navigate to the system log.
    2. You will see Kerberos logs in the system log.
    3. If you see errors for KDC_ERR_PREAUTH_REQUIRED, these can be ignored. This just says kerberos authentication was attempted.
    4. If you see KDC_ERR_S_PRINCIPAL_UNKNOWN or KRB_AP_ERR_MODIFIED, this means the SPN for the service account is not set correctly. The SPN HOST/ClusterName must be set on the account represented by the keytab configured in AS.
    5. If you see KDC_ERR_ETYPE_NOTSUPP, this means you have disabled some encryption types in your organization. By default, Windows uses RC4-HMAC for passwords for accounts unless explicitly specified to use AES128 or AES256. So if you intend to force AES128 or AES256
    a. Use AD Users and computers to navigate to the user account that represents the keytab file used in AS.
    b. Click the Account tab, then select
    * This account supports Kerberos AES 256 bit encryption.
    c. Click OK to save the updated account information.

    d. This is important - Reset the password of this account, regenerate the keytab, and update it in AS configuration. This allows the new password hashes to be generated. Without this step, the kerb authentication will still fail.

AS Authentication to SQL using Integrated Authentication

  1. To configure SQL server for integrated authentication, SQL server should be running on domain joined machines, and the SQL server service must be running as a domain account, and AS must be configured with the keytab corresponding to the SQL client account whose credentials will be used to connect to the SQL database. Using Kerberos with SQL Server
    a. As specified in the above document, the service account that SQL server is running as must have the SPN MSSQLSvc/ExactSQLServerNameUsedByAS>:1433 (Not the account of the keytab configured in AS)
    b. If the above has been configured correctly, and still the SQL connection fails (from the AS pod logs), you should debug the SQL server integration using a windows client with SQL Management Studio, and connect to the SQL server using the same connection string used by Automation Suite.
    i. When doing this debugging, we do a similar debugging as in 2.c above. On the windows machine with SQL Management studio, turn on kerberos logging using Enable Kerberos event logging - Windows Server | Microsoft Learn. First set the LogLevel value as specified in the doc, then try connecting to the SQL server.

    1. Start Event Viewer, and navigate to the system log.
    2. You will see Kerberos logs in the system log.
    3. If you see errors for KDC_ERR_PREAUTH_REQUIRED, these can be ignored. This just says kerberos authentication was attempted.
    4. If you see KDC_ERR_S_PRINCIPAL_UNKNOWN or KRB_AP_ERR_MODIFIED, this means the SPN for the SQL account is not set correctly. The SPN MSSQLSvc/:1433 must be set on the account represented by the keytab configured in AS. If your SQL server name using a CNAME record in DNS, clients may resolve the CNAME record first before verifying the kerberos service name. The SQL server must have an IP address mapped to the SQL server name directly. Not through an Alias.
    5. If you see KDC_ERR_ETYPE_NOTSUPP, this means you have disabled some encryption types in your organization. By default, Windows uses RC4-HMAC for passwords for accounts unless explicitly specified to use AES128 or AES256. So if you intend to force AES128 or AES256
     a. Use AD Users and computers to navigate to the user account that represents the keytab file used in AS.
     b. Click the Account tab, then select
       * This account supports Kerberos AES 256 bit encryption.
     c. Click **OK** to save the updated account information.
     d. This is important - Reset the password of this account, regenerate the keytab, and update it in AS configuration.
    

For sql kerberos connectivity there is also this tool which can be installed on sql server and do some checks.

1 Like

Another type of problem that you may encounter when integrating with AD -

When you look at the identity-service logs as mentioned in step 2 of the article, sometimes, you may see

UiPath.IdentityServer.Directory.LdapAD.LdapADAdapter Ldap connection failed due to error 82 message A local error occurred.

This is a generic error from the LDAP libraries. If this happens, we know of the following root causes of this issue. There may be more AD environment related issues that may cause this with the openldap libraries. I will keep this document updated as we find more reasons.

  1. Upgrade to the latest available CU. They contain changes to turn off LDAP canonicalization which may cause this issue when we use the open LDAP libraries
  2. Ensure the DNS SRV records are properly cleaned up for domain controllers that are de-provisioned. List the SRV records for your domain in your DNS server in _kerberos._udp., and ensure they match 1-1 with _kerberos._tcp.. If there are entries that are cleaned up from the tcp SRV records but not in udp SRV records, this issue can happen.

To acquire more logging for kerberos integration, one can turn on kerberos log by set

- name: KRB5_TRACE 
  value: /dev/stderr

in the identity-service-api pod env section.

I have tried enabling the kerberos. All the pods mentioned above are all running fine. However, in host, when I try enabling the Kerberos Auth and try “Test and save” , I get an error. It doesn’t show much information.

Any idea how I can resolve this?

I did try looking up for this, but my identity.DirectoryConnections table is empty. Would this be causing the issue?

Hi Midhun, I would need to look at the logs, and potentially take feedback to improve the error message. Do you have a support ticket open? I would need your support bundle output to investigate. It is not a good idea to post it in the forums.

My general tips:

  1. The absence of the DirectoryConnections entry means that there was no directory configuration in this deployment ever. So that is not the problem here.
  2. Look for the errors in the identity-service-api pod logs in the platform application. They should give you a clear reason why it failed.
  3. In most cases, it is because the instructions from this document were not all followed, specifically, “Setting up kerberos authentication”

Thanks @Sriram_Vasudevan . I have sent you the support ticket details via DM. Appreciate if you take a look at let me know what could be the issue.

Midhun, from your support case, it looks like you had many unreachable domain controllers returned by your DNS, and we encountered timeouts. You needed to cleanup your DNS records/do a metadata cleanup of your unavailable DCs. Please confirm you were able to resolve the problem.

Hey Sriram, sorry I missed to respond. The issue was fixed with a workaround. So basically we have around 10-15 DNS servers including DR. When we initialize a connection, it will try hitting all the DNS servers one by one until it establishes a connectivity. However, UiPath seems to have a pre-defined retry limit (it’s either number of servers, or a timeout, I’m not really sure). Anyway, the working DNS server was outside these bounds, so it kept failing. Our DNS team configured direct routing of the connections from Automation Suite server to a working DNS server (so will not try one by one) and it established the connectivity.