Logs - ElasticSearch

Anyone who uses ElasticSearch?

What’s the real benefit when comparing to the Orchestrator standard dashboard, Orchestrator standard mail notifications (when job failures take place), Orchestrator job logs and SQL/Text files that house it by default?

Besides Elastic, is there any alternative solution that you are either using currently or have heard about?

I know how Elastic/Kibana works, but just trying to get some insights/experiences…

Thanks a lot in advance!

Hi @Felipe_Kotinda,

Very good question.

Background:
UiPath uses NLog for all its log generation and NLog is extremely customization. Read more here: (Robot Logs)
In the Orchestrator web.config file you can include not only the default key values but also many other log parameter. Read more here: (Logging and Log Levels)

Once you have the NLog config choosen, you can choose to save the log in a custom file which will be consumed by your dashboard software.

Why not to use UiPath Orchestrator Dashboards?
Although a good feature, I do not recommend standard Orchestrator dashboard for anything more than just a quick analysis.

Advantages of custom dashboards:

  1. You do not want to limit yourself to Orchestrator dashboard which is only lasts for the previous 30 days.

  2. You can make dashboards both for management (monthly reports of robot utilization) and surveillance (production errors)

  3. With a custom dashboard solution you can pinpoint any failure event and the log message obtained during the execution. This has saved us a lot of hassel finding the reason for a failure.

  4. You make create not only charts buy also use the dashboard as a tool for developers to identify failures under production.

  5. Sending exceptions cases / reports directly to manual case handlers.

Disadvantages:

  1. You will need some development time to setup project dashboards as per your design and stakeholders need to be involved in its design.
  2. Data forwarder issues in any of the visualization tool will also need some focus such that if robot / orchestrator has a down-time the data forwarder can restart as well.

Alternatives:
Other than UiPath Orchestrator and Kibana, you can use

Grafana,
PowerBi,
Tableau,
Splunk,
UiPath Insights
and many others

All these tools do two operations; a) parse the log files from robot execution, b) prepare the required data for charting or reporting.

Our experience:
In our organization we choose to go with Splunk as it was already available in-house and all we needed is a dedicated index for RPA CoE. Splunk has a data forwarder on the robot/ orchestrator machine. All logs are reported to the defined index.

From the index we generate process execution reports (one dashboard per delivered project) and monthly execution reports ( total robot transactions, total robot runtime in hours, manual process time without a robot, production date). The manual runtime is estimated time given by process owners during the initial process evaluation.

Our management appreciates that the RPA CoE not only automates mundane work but also reports exactly what the savings are. Monthly reports show how the robots are performing on the different processes.

So, you are on the right track. It is very important that all stakeholders involved have an overview of robot processes and as developers we are aided by troubleshooting tool, and that is the main reason to use custom dashboards.

Hope this helps! Good luck to you.

2 Likes

I have made a custom application for the same. It can read log files from local as well as remote machine.

Features:

  1. Users can choose any log file they want.
  2. Users can filter log based on Log Levels
  3. Users can filter for specific process along with specific timing
  4. Users can filter based string contains.
1 Like

Thank you very much Jeevith, such a thorough feedback and sharing !

Just a quick and last question, when you said that you are using Splunk, are you storing all logs there but orchestrator general info?

For instance,

SQL Server → Assets / Machines / Processes and all the rest of info (standard) based on UiPath schema table

Splunk → All logs (and all sort of logs such as trace, fatal, info, warn, verbose…)

So you’ve changed the targets for log info through NLog file and in turn the logs were no longer incorporated to the SQL Server (only Splunk as per NLog file target mapping) ?

Whoa, such an amazing initiative… Think in the possibility to sell that in the marketplace, I do sincerely believe that a bunch of dev would be interested in your app =)

1 Like

Hi @Felipe_Kotinda,

That is correct. The data flow is

Robot Execution log (Robot Machine) - .txt → Orchestrator log (Orchestrator Machine) - .txt →
Splunk Forwarder (on both Robot Machine and Orchestrator Machine) →
Splunk Index → Splunk Query → Splunk Dashboard / Report

One thing to note here is we use dedup (remove duplicates) in Splunk so that our dashboards do not contain results from both Robot and Orchestrator Logs.

Retentionpolicy is set to 1 year after which we still have access to the .txt files on robot and orchestrator machines. We can consume the .txt files again in the future. A rough estimate after close 8 months of robot execution (6 processes) our .txt log files are around 400 mb on DEV and around double that in production. Which is easy work for a tool like Splunk to parse.

All logs in Development and Production are also sent to a dedicated SQL in Dev and Prod environments, which we treat as a rainy day fund. If not Splunk, we can still can use the logs stored in SQL to create our dashboards in any other tool.

As you see, we try to keep a backup of everything. We never know when we might stop using Splunk. It is best to keep the logs as simple and in a basic format in auto backup drive within your organization. Gives you agility.

Starter help
Let me share our <target> and <rules> tag here as they are quite generic and not sensitive information. Anyone else wanting to customize NLog can use this as well and save a lot of time googling :slight_smile:

As a team of 3, it took us a while to get these following targets tweaked to our needs. The TotalExecutionTimeSeconds=${event-properties:item=totalExecutionTimeInSeconds} is brilliant to have. The below files lets us slice and dice our query based on Process / Host / Robot / Time / Message / QueueName.

Create two variables in web.config for Robot and Orchestrator logging directory like so.

<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" autoReload="true" throwExceptions="false" internalLogLevel="Off" internalLogFile="">
     <variable name="RobotLoggingDirectory" value="E:/Logs/UiPath/Robot" /> 
     <variable name="OrchestratorLoggingDirectory" value="E:/Logs/UiPath/Orchestrator" /> 
     <extensions>
     </extensions>
     <targets>
     </targets>
     <rules>
     </rules>
</nlog>

In the <targets> tag of web.config we log to SQL (This is usually a default in web.config file):

        <target xsi:type="Database" connectionStringName="Default" keepConnection="true">
          <commandText>
            INSERT INTO dbo.Logs (OrganizationUnitId, TenantId, TimeStamp, Level, WindowsIdentity, ProcessName, JobKey, Message, RawMessage, RobotName, MachineId, UserKey)
            VALUES (@organizationUnitId, @tenantId, @timeStamp, @level, @windowsIdentity, @processName, @jobId, @message, @rawMessage, @robotName, @machineId, @userKey)
          </commandText>
          <parameter name="@organizationUnitId" layout="${event-properties:item=organizationUnitId}" />
          <parameter name="@tenantId" layout="${event-properties:item=tenantId}" />
          <parameter name="@timeStamp" layout="${date:format=yyyy-MM-dd HH\:mm\:ss.fff}" />
          <parameter name="@level" layout="${event-properties:item=levelOrdinal}" />
          <parameter name="@windowsIdentity" layout="${event-properties:item=windowsIdentity}" />
          <parameter name="@processName" layout="${event-properties:item=processName}" />
          <parameter name="@jobId" layout="${event-properties:item=jobId}" />
          <parameter name="@message" layout="${message}" />
          <parameter name="@rawMessage" layout="${event-properties:item=rawMessage}" />
          <parameter name="@robotName" layout="${event-properties:item=robotName}" />
          <parameter name="@machineId" layout="${event-properties:item=machineId}" />
          <parameter name="@userKey" layout="${event-properties:item=userKey}" />
        </target>
      </target>

web.config RobotLogs to custom RobotLogFile:

<target type="File" name="robotLogFile" fileName="${RobotLoggingDirectory}/${shortdate}_Execution.log" layout="${date:format=yyyy-MM-dd HH\:mm\:ss.fff} RobotName=&quot;${event-properties:item=robotName}&quot; Message=&quot;${message}&quot; ProcessName=&quot;${event-properties:item=processName}&quot; Machine=&quot;${machinename}&quot; Level=&quot;${level}&quot; TotalExecutionTimeSeconds=${event-properties:item=totalExecutionTimeInSeconds} TenantID=&quot;${event-properties:item=tenantId}&quot;" keepFileOpen="true" openFileCacheTimeout="5" concurrentWrites="true" encoding="utf-8" writeBom="true" />

web.config OrchestratorLogs to custom OrchestratorLogFile:

<target type="File" name="orchestratorLogFile" fileName="${OrchestratorLoggingDirectory}/${shortdate}_Execution.log" layout="${date:format=yyyy-MM-dd HH\:mm\:ss.fff} Message=&quot;${message}&quot; ProcessName=&quot;${event-properties:item=processName}&quot; Machine=&quot;${machinename}&quot; Level=&quot;${level}&quot; TotalExecutionTimeSeconds=${event-properties:item=totalExecutionTimeInSeconds} TenantID=&quot;${event-properties:item=tenantId}&quot;" keepFileOpen="true" openFileCacheTimeout="5" concurrentWrites="true" encoding="utf-8" writeBom="true" />

Nlog needs predefined rule set - here the order of the rules matter.

<rules>
      <logger name="Robot.*" ruleName="insightsRobotLogsRule" enabled="false" minlevel="Info" writeTo="insightsRobotLogs">
        <filters defaultAction="Ignore">
          <when condition="level >= LogLevel.Error or ends-with('${message}',' execution ended')" action="Log" />
        </filters>
      </logger>
	  <logger name="Robot.*" minlevel="Info" writeTo="robotLogFile" />
	  <logger name="*" minlevel="Info" writeTo="orchestratorLogFile" />
      <logger name="BusinessException.*" minlevel="Info" writeTo="businessExceptionEventLog" final="true" />
      <logger name="Robot.*" ruleName="primaryRobotLogsTarget" final="true" writeTo="database" />
      <logger name="Monitoring.*" writeTo="monitoring" minlevel="Warn" final="true" />
      <logger name="Quartz.*" minlevel="Warn" writeTo="eventLogQuartz" final="true" />
      <logger name="*" minlevel="Info" writeTo="eventLog" />
    </rules>

In Splunk
A query for Robot Runtime would be (we have a custom Message which reports robot runtime and also the QueueName)

index="YOUR_INDEX" host="YOUR_HOSTNAME" ProcessName="YOUR_PROCESS" Message="*The total execution time in seconds of dispatcher is :*QueueName :*" 
| dedup _time consecutive=true    ``` Remove Duplicates```
| rex "The total execution time in seconds of dispatcher is\s:\s(?<TotalExecutionTimeSecondsDis>\d{0,100})"   ``` Regex TotalExecutionTimeSecondsDis```
| convert auto("TotalExecutionTimeSecondsDis") as TotalExecutionTime   ``` Conver to double```
| eval SumExecutionTimeHours = sum(TotalExecutionTime)/(60*60)   ``` Convert to hours ```
| stats sum(SumExecutionTimeHours) as "TotalRobotExecutionTime"   ``` Sum hours ```
| fields TotalRobotExecutionTime  ``` Return only the required fields, ignore others ```

For advanced Splunk users, yes we can also use a single base query and build dashboards on it but not reports. This way Splunk has to parse the logs / index only once per refresh of the dashboards and not execute individual query for each panel in the dashboard. Our Splunk team recommended using base search.

Example of base search in Splunk

<search id="dispatcher">
    <query>
      index="YOUR_INDEX" host="$host_name$" ProcessName="Process_Dispatcher*" 
    </query>
    <earliest>$time_token.earliest$</earliest>
    <latest>$time_token.latest$</latest>
  </search>
  <search id="performer">
    <query>
      index="YOUR_INDEX" host="$host_name$" ProcessName="Process_Performer*" 
    </query>
    <earliest>$time_token.earliest$</earliest>
    <latest>$time_token.latest$</latest>
  </search>
  <row>

<panel>
      <single>
        <title>Robot Runtime (Hours)</title>
        <search base="performer">
          <query>
                   | dedup _time consecutive=true    ``` Remove Duplicates```
                   | rex "The total execution time in seconds of performer is\s:\s(?<TotalExecutionTimeSecondsDis>\d{0,100})"   ``` Regex TotalExecutionTimeSecondsDis```
                   | convert auto("TotalExecutionTimeSecondsDis") as TotalExecutionTime   ``` Conver to double```
                   | eval SumExecutionTimeHours = sum(TotalExecutionTime)/(60*60)   ``` Convert to hours ```
                   | stats sum(SumExecutionTimeHours) as "TotalRobotExecutionTime"   ``` Sum hours ```
                   | fields TotalRobotExecutionTime  ``` Return only the required fields, ignore others ```
         </query>
        </search>
        <option name="drilldown">none</option>
        <option name="height">84</option>
        <option name="numberPrecision">0.00</option>
        <option name="rangeColors">["0x53a051","0x0877a6","0xf8be34","0xf1813f","0xdc4e41"]</option>
        <option name="refresh.display">progressbar</option>
        <option name="underLabel">Timer</option>
      </single>
    </panel>
 </row>

Hope this helps you and others.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.