I have a couple questions about how to structure a workflow when using the enhanced ReFramework.
Let’s assume a standard process that uses a ‘dispatcher’ to get data and upload to an orchestrator queue, then a a ‘processor’ to actually pull the transaction from the queue and process them. Let’s also assume that the process requires multiple robots (2+) to do the processing portion.
In the past, I’ve handled this by having one job as the dispatcher and scheduling it to one robot only. Then having a second job as the processor and scheduling it shortly after the dispatcher with however many robots are needed. However, the ReFramework seems to recommend using a FirstRun service to do the dispatcher on the first run only and process the rest of it. This works well if only one robot is doing the processing, but how can you handle it with multiple robots doing the processing?
Also, if there is a system error (e.g. had to restart the computer), how do you avoid running the dispatcher when you re-run the process again?
I’m trying to envision a way to do it without running a separate ‘dispatcher’ job, but am struggling to figure out how it is actually implemented. Any advice would be appreciated
Currently the execution of FirstRun is decided by a flag in the Config file, but if the package is deployed by Orchestrator, then we cannot easily change that. One option would be to get that flag from an asset, but that’d require some manual intervention when you want to skip the dispatching phase.
I believe this also depends quite a bit on the source of data and how you handle new items. For example, if you get data from unread emails, mark them as read and then add that data to a queue, you can run the dispatchet multiple times, since it won’t get the read messages in subsequent runs.
What kind of transaction are you dealing with? Maybe the easiest way is to mark the dispatched items, so that the FirstRun doesn’t dispatch the same thing multiple times.
What are you thoughts on using Queues with multiple robots to update files shared to the process owner/user?
A typical complex process will do many tasks, but after each of those tasks, it will almost always update each item with its result so the end user (or developer) knows it was complete… and per item (not at the end of the process, so if a failure happens it will still display its progress and also use that info to continue from where it left on its next run.
Additionally, another goal has been to somehow prevent failure when that file is in use, such as create duplicate filenames or even locking the file (but locking it in a way that let’s the robot still use it).
So, am I the only one who has my doubts about using the Queues for this reason, or is there a solution that is being recommended for managing files? My current thoughts are maybe we should wait until the file is not in use, but at the same time not get stuck waiting until eternity
Yes, that’s the way to go, but the asset must have different values per robot, as you want only one to receive the signal to dispatch items.
We don’t use multiple robots on a queue that requires updating the same file for exactly the reasons you mentioned.
Sometimes it’s doable to get around it, but usually it isn’t worth it. For example one process deals with a large excel sheet that contains multiple transactions within it. We needed multiple robots to work on it due to time constraints. In that case we created a “PreProcessor” job (also acted as dispatcher), along with a “Processor” job and a “PostProcessing” job. The Preprocessor would take the current day excel file and split it into many different excel files (1 for each transaction) based on business rules, then add a queue item specifying which excel sheet to use.
The Processor would then grab a queue item and process the individual excel sheet. The PostProcessor was queued up to run and would start on one robot after Processor was finished. This robot merged all the excel files back into one excel file, then deleted the inidividual excel files. It was a bunch of unnecessary files created and a lot of extra work, but it worked out in the end. Honestly not really recommended except as a last resort though
That’s interesting though. So like basically add to the queue a process for pulling in all the transactions as they finish and updating the one Excel file, or something like that.
Yeah, I cannot think of any direct/simple way of doing this concurrency control with multiple robots currently.
Alternatives include a split-merge approach (like @Dave mentioned) or implementing something with a semaphore (kind of related to what you said last).
In any case, thanks for the idea. We’ve received similar feedback about this kind of feature before, so it might show up in a future version!
@qbrandon reminded me of another option to do that: using job parameters.