Best Practice: Do you regularly utilize Try Catch on the whole workflow?

Hi guys, thanks for all your contributions to this topic, especially @octechnologist and @ClaytonM.

I want to revive this thread because as RPA is evolving, so should our best practices and frameworks. With more automated processes and more robots come different challenges, so this post is regarding flexible and scalable exception handling in rapidly changing IT-environments.

Consider this:
A new and unexpected exception comes up in a system used by a lot of robots. Robots cannot recover from this by simply restarting within the ReFramework. A human can log on to the machine and perform a workaround, but the error might come again the next day. IT can’t help and so your robotics operations team is stuck handling the same error on several machines to keep things running. This takes a lot of time.

Now, robots should be able to perform the workaround themselves, but due to the high number of processes it takes plenty of time to implement the workaround and republish for every process. Also, if one thing is for sure, it is the next (major) exception will always be coming! The exception handling should be flexible and scalable!

One idea is to publish an ExceptionHandling workflow to Orchestrator as a library and then use this library in all processes to - obviously - handle exceptions if they occur. When the next exception comes up, the robotics team can update the ExceptionHandling library to include a workaround for the new exception, publish to Orchestrator and use the Mass Project Dependencies Update Tool to equip all the processes with the new ExceptionHandler.

This way your robo operations can quickly react to global issues in the IT environment and implement a fix for all your robots with comparatively little work.

Do you guys have experience and/or best practices with some kind of flexible and scalable exception handling? What do you think of the scenario explained above? Am I missing anything? I’m excited for your feedback.

Thanks,
Lukas

2 Likes

What type of IT-environment errors are we talking about?

I would assume some to be server downage, but most errors being user profile related.

If the server or user is inaccessible, then this would need to be solved from the Robot-allocation side or IT needs to be on top of the issues more effectively (which they never are!). Orchestrator is supposed to only allocate on accessible robots, but this isn’t always reliable since a user profile could still show ‘connected’ but not really working. - Technically, you could build some monitor job that runs frequently to check the accessibility or user profile corruption and mitigate these types of issues. :man_shrugging:

The other types of issues which are more common are the user-profile related ones. This could be like Internet Options being reset that turns off popups / changes security settings cause websites to stop working, Chrome extension is not working, application versions different or not configured, and other things. These types of errors allow the robot to function but cause it to fail on one user profile but work on another.

I do believe that most of the Framework components should be deployed as a Library, anyway. However, I’m not sure that applying this solution (ie, to fix the user profile settings) as part of the Framework Library, which requires mass updates, is really necessary. This is because you can run a separate job that fixes these things, and you can optimize the Framework to effectively use multi-threading to continually start jobs on working robots.

So, the question is, is it better to schedule a separate job that checks and fixes user-setting issues? If you include this type of thing as part of each job, it could add several minutes of process time per day and also require mass updates when changes are needed.

I also mentioned ‘multi-threading’. I believe this is a very good way to get around user-profile setting issues, because when the job fails a transaction, the Framework will recognize that it needs to trigger the job again to equal the specified number of robots that it’s set to run on. I have had lot of success with this.

For the effective use of multi-threading, some optimizations will be needed, even still with my own projects. In the end, the job would be able to know how many robots it wants to run transactions with (a dynamic number, keeping a certain number of robots available for other critical jobs) and start the concurrent job if transactions are available.

So maybe if this is combined with notifications when a setting issue is detected (a robot fails consecutive times), resolving these things is more efficient.

Sorry if these ideas are a bit unorganized :sweat_smile:

Thanks
Clayton

2 Likes

Hey Clayton,

thanks for the response. I must have read your answer ten times by now, but I’m still struggling to understand the whole multi-threading concept.

I was not talking about any specific kind of error, but some can surely be categorized as server issues. What I get from your answer, is that you would rather recommend building a separate job to fix user profile settings etc. and then continue with the original job?

Does the multi-threading mean, that these two jobs would run in parallel on the same user and machine?

1 Like

Where it runs transactions on multiple different users and machines at the same time. So, if one user / machine combination is broken, it still runs cause other user / machine combinations would be working and still execute the transactions.

To be effective, though, each running job would need to check the number of running jobs to ensure it starts a new job if one fails. It’s also possible newer versions of Orchestrator has autorun for Queues, which is not something I have explored if it does.

That is my initial thinking because if every job that runs does some type of setting validation, that could add considerable time away from business processes throughout the day. Server configuration probably only needs to be checked once daily.

1 Like

Hey @code_monkey . I see that you were successful with this, but for some reason I cannot seem to get mine to work. I am trying to use try/catch within a for/each loop. You can see my issue here, and if you have any ideas, I would appreciate your advice!