Monitoring of running processes

Hey guys!

I hope you’re all okay.

Today we work with 8 licenses to run a little over 90 processes. Basically, when we have some random network problem, some of our robots may crash and not return until someone finds that that process is stuck for some reason. Be it an unexpected screen, unexpected network update and so on. Even with a timeout deal, to report an error, in some cases they don’t work so well.

To better deal with these cases, I thought of creating a robot that performs every 30 minutes the verification of all processes that are running and, if any of the processes has been for more than X time without sending any logs, send me an alert so I can analyze if there was a crash in the queue or not.

I believe that the best way to do this is using the Orchestrator API, but before that, I would like to know if you have already experienced this problem and what is the case used for better queue management.

Today we have processes that need to be executed at pre-defined times and, if for some reason (any one) there is a crash in the queue, we end up generating a problem with the lack of execution of another automation.

Have you ever had the same problem? If yes, how did they solve it?

Note: The orchestrator’s alert system doesn’t help much as it sends notification for any errors or alerts and I don’t need all the emails, only for processes that got stuck.

Thanks!

If your processes are running out of control like that, then they’re not written properly. Error handling is an important part of things.

You can also set in the Trigger that it kills the Job after a set amount of time.

image

Helo!
Thanks for answering.

Yes, there is this functionality, but I don’t think it is very effective for these cases. Basically, the “stop after” counter starts as soon as the process enters the queue and not when it starts executing. That is, if we configure the process to end after 2 hours and it remains in the pending queue for 3 hours waiting for the end of the last process in the queue, it simply doesn’t even start its execution, because the 2 hour period would have been completed.

At least for our business this functionality is not so usual.