Auto Restart of failed jobs in orchestrator

Hi Team,

We have got an option for each process under JOBS to restart the same job whenever we need

But it would be great if we can have an option in such a way that if a bot fails due to some exception (any exception) can the orchestrator automatically restart that process

Because it’s the same what developers do when a bot fails, like they see the error and then restart the process

So this option can be added as a Enable/Disable Option while creating PROCESS itself like
Auto Restart - Enable / Disable

If the option is enabled it will restart the job automatically when it fails due to exception or it won’t

This will help the developer saving some time in monitoring the bot and restart the jobs that got failed

Cheers

Hey @Palaniyappan

Could you please provide some more context? In a typical scenario, wouldn’t this result in multiple failed processes after each other?

I just found this thread looking for some info on enabling this, so I will give you the scenario that I ran into. We have a quite stable automation that runs morning/evening Monday-Friday to change certain client communication parameters per specific client’s requests. However, this Friday evening I missed the notification that the process failed:

RemoteException wrapping System.ArgumentException: The Computer Vision server encountered an error.
[500]

I agree that if the setting is as simple as re-run for all errors it could often cause infinite loops of failed jobs, but in a case like this where an important job failed due to what was almost certainly a just a momentary network issue, it would be nice to know that the job could run again.

Thank you for the extra context, I get it now. I will leave it up to the Orchestrator team to consider it for the future :slight_smile:

Hey Team, has there been any update on the release of this feature?
I would like to reiterate that this is quite a useful one to have, at the Orchestrator level.

Additional control parameters for this feature would be great:

  1. Retry/Rerun only ‘certain’ number of times
  2. Retry/rerun on ‘certain’ conditions - like on status, Failed, Terminated, etc.
2 Likes

I agree. This is an important feature for mission critical app.

Hi, I agree this would be a great feature! Does anyone know if this is being considered by UiPath?

I’ll add another use case:

image

This process failed to run today. When I manually restarted it, it ran fine. It would have been great if this had tried to automatically restart.

After thinking about it, you can hack this to work by building a dispatcher to submit a queue item.

Then that queue item triggers the process you really care about to start. You can set the queue items to be retried x number of times, and if the process fails before starting the queue item, it will try again on the next half hour.

I concur. We are still at version 21.10, so I don’t know if this has been fixed in 23.10.

Here is one use-case that I imagine everybody has experienced:
image
This error happens if a job starts that was previously pending, waiting for a pc to become available. This error in itself is ironic, as Orchestrators primary task is to be able to start jobs. However this error has been around since we started using Orchestrator back in 2018.

Ideally the error should be fixed, as it is probably easy to fix, but this is a current use case of a “Restart job On Condition” example.
If the job fails with a condition containing a string, in this example “0x800700AA”, the job can automatically restart after X time, on X machine. It could also include a max no of attempts a job should auto-restart.

Another use case could be any process failing in the phase of initializing apps, which is before any data has been processed, or any queue has been modified, this auto-restart feature could be handy, again with a condition determined by the end user.
For example, if Job ACME fails and the Info contains string “Something went wrong initializing apps”, it could retry the job.

We have several robots bulding large Queues in the weekend, and hopefully processing them before monday morning.
Come monday morning, quite often something has gone wrong, and the normal approach would be to just restart the jobs, which is comically simple to do.