Job starts on robots on wrong machine

Hello,

We have a recent problem that came with updating our robots to 20.4.3. For some reason, two jobs start on the same user on two different machines, even though the robot with the user is configured to use only one of the machines.

For context, we use classic folders on our on-premise Orchestrator, 2020.4.2.

In this image you can see two instances of the job (they were not running concurrently, more on that later). We have named the robot 28-01 after the server the robot is connected to, VDA028. But as you can see from the image, the job list shows that the job has also started on VDA013, our other robot machine. And this has happened even though the robot 28-01 is specifically connected to VDA028 (as is the case in classic folders).

What is furthermore puzzling, is when we look at the logs generated by this job that ran on the wrong machine. The logs start like this, and you can immediately spot that one is from another machine than the other:

As you can see, the job has started twice. And on top of that, it started with the same job key. The previous image is taken by looking at the job logs, and since they share the job key, they both show up in the same log view. Here you can see details from the “execution started” images. The job key is the same, but you can see the machine is different! Furthermore, the user is the same and the robot is the same.

image

image

The two jobs then run concurrently (which in itself causes headaches, as the jobs share the robot user). Another problem then arises, when one of the two tangled jobs ends. At that moment, the orchestrator flags the job (or the job with this job key) finished. And marks the robot available. But in reality the robot is not yet available, as the other job of these two tangled ones is still running!

This leads to Orchestrator then starting a new job on the same robot (as it should, the robot is marked as available after all), but it then fails with the following error:

image

This error message was the initial fault we found, but we were able to trace it back to having two jobs running simultaneously with the same job key, on the same robot, but on two different machines.

If you need more details or I didn’t manage to explain things clearly enough, I’ll be happy to provide more details. But has anyone else here ever encountered the same issue, and if so, are there any fixes we could do?

PS.

  • We already deleted and re-added our robots with new names, it didn’t help.
  • The job is started by a queue trigger.

EDIT:

  • The same problem can also occur when starting jobs with just X runtimes, not just job triggers
  • This problem occurs seemingly randomly, about every 1 or 2 hours.
  • All robots are unattended.
  • Deleted and re-added the machines (and also robots). No help, it still keeps happening.
  • A job ran from a timed trigger, with input “allocate dynamically: 1 times”. It still started twice, with a single job key.
2 Likes

Can please tell me bots are attended or unattended??

Unattended, we don’t have any attended robots.

@ovi Can you help him??

I’ve also contacted UiPath Support yesterday, I’ll post here any updates and findigs as well.

I was able to replicate the issue on our test setup. Steps to reproduce are quite simple:

We need a setup where we have two robots on separate machines, and a job that can run on both.

  • Install UiPath robot to two machines (the usual stuff)
  • Create machines A and B and connect the robot services
  • Create two robots, one on machine A and one on machine B (let’s call them X and Y respectively)
  • Create one environment, add both robots to it
  • Publish a project. Any process will do (I used a process with only 15s delay in it)
  • Add it to the environment
  • Set up a schedule to run the job (I ran the job every 30 s just to speed up the process). Strategy: allocate dynamically 1 runtime.
  • Wait

After a while, you can find jobs that were started e.g. on robot X, but ran on both machines A and B (still with robot X), even though B and X are not connected.

After checking from Kibana dashboards, I can see that after 580 or so jobs, we have 3 instances where the job has started twice.

Hi @tuokyh
Do you have the solution for this problem?
It’s occured on me! :disappointed_relieved:

Hello,

It’s nice to see I’m not alone with the issue, however I do feel your pain.

Unfortunately there is no solution, I’m eagerly waiting for a patch that fixes this.

I contacted UiPath support a while ago, they have been able to replicate the issue, and it has been forwarded internally. So I’m confident the issue will be solved in a future update.

EDIT: I forgot to mention that Modern Folders don’t seem to have this issue, so if your processes would work in one, switch over and the problem is gone. Our processes unfortunately require Classic Folders, so…

Hi @tuokyh
Thanks for your reply.
I get the solution of my problem, the problem like your,but not same as you.
I use Modern Folder,not Classic Folder.
Best Regards!

This is a bug in Orchestrator for which we are looking to release a patch. Timeline is not yet established.

1 Like

Hello @CosminV,

Is this fixed yet ? Does 20.10 LTS version have this issue ?

Thanks & Regards,
Nithin

1 Like

Hi @Nithin_P,

I can confirm that the issue is solved in the newest versions of 2019.4 and 2020.10.

2 Likes

Same issue on version 20.10.4, any news about possible timeline to fix it?
In our case, triggers are useless because of this bug

2 Likes

Hi @achindris

Would you mind creating a support ticket with our technical support?

This issue should be considered fixed in this version and we would like to investigate it.

Hi There,

I’m having the same issue with Orchestrator Version 2020.10.7 and Modern Folders, could you please provide support? Thanks

Hi @fabio.blonda

You might want to contact our technical support to be able to provide more information to our team about this issue:

They should be able to provide all the available context about the issue and offer some workarounds :slight_smile:

Hi Guys
We are on 2021.10.0 and we seem to be seeing a similar issue.
We are using API calls to generate a trigger job in Orchestrator which looks correct but when it runs its using the wrong robot.
Anyone else see this or know of a work around for it please ?
Damian