Best Practices for organizing workflow that is largely contained in a Loop

bestpractices

#1

I have a workflow as follows:

1). Get a datatable with a list of payments made that day from database (approx. 15-30 per day)
2). Matches for these payments are found in a separate excel file an unknown amount of days prior to today. Each day contains one excel file. I’ll look back a max of 30 days
3). Loop through each excel file and look for matches for my datatable (can be more than one match in each excel file)
4). Perform fairly significant processessing for each match between the excel file and datatable
5). Continue looping until 30 excel files looked at, or all datatable items have been matched

Sorry if that’s too much info, but based on the above, steps 3-5 are all going to be contained in a Do While loop (checking if all datatable rows have been matched). There will be loops within loops & significant processing will be done during step 4. What is the general best practice for ease of readability, testing, etc. when creating this in UiPath? It seems like everything would have to be inside of a sequence-like activity if it is all contained inside of a Do While body, or is that not correct?


Invoke workflow vs embedded workflow?
#2

There’s never too much info :slight_smile:

Definitely this could be split into multiple, atomic (or at least - testable) parts.

A lot would depend on if you’re using Orchestrator (Queues could help a lot) or not, as the approaches would need to differ in significant ways, but the most crucial part is to get step 4 out of the loop.
This can be easiest achieved with Dispatcher-Performer setup (1st process builds queue, 2nd performs work), or multiple/combination of those.

One thing to note if we’re talking about Best Practices is that volume and scaling should impact design:

  • the lower the volume, the less scalable it needs to be (lowers design, implementation and possibly maintenance complexity cost)

  • volume should be judged for each step separately
    Example:
    Your initial payments have low volume and high chance of staying that way. That part doesn’t need to scale => Dispatcher can be linear
    Matching of payments to files has high amplitude of predicted execution time => Performer1 needs to be able to scale and/or performance is a key metric for design. This depends on what the “last 30 days” excel files look like.
    Actual processing for each matched item scales linearly (probably) => Performer2 doesn’t need to scale / performance is less weighted for design of this part

  • Process borders can be clearly defined
    Modularity of solution significantly increases maintainability. If you can split your project into parts that have clear, and independent, input/output sets, it’s much easier to design Behaviour (Scenario) tests for them, as well as enabling parallelization of implementation to multiple people/teams (just agree on what data goes in and out and you’re golden).

I don’t think this answers your question completely, but it’s a start. Lets let others chime in as well :slight_smile:

Btw - your post gave me a nice idea for something else. Thank you :slight_smile:

Edit:

And one last thing with Best Practices - unless they can be adopted by your whole team, they’re not a Best Practice for you. There is no silver bullet and even if something could be objectively better, if your maintenance/ops can’t handle how it works (after reasonable training), it’s a wrong design decision.
Just IMHO of course.


#3

Thank you for the very well though out, detailed, and informative response!

I am using Orchestrator, but I didn’t even think of setting up a queue.

So based on your response, would you recommend:

  • Process 1 = Get initial payments and setup queue1 where each item in queue1 = 1 initial payment
  • Process 2 = Look through previous days one-by-one to find excel file with the matching payment for each item in queue1
  • Process 3 = Setup queue2 for the processing where items = 1 initial payment (updated with location of excel file to use for each payment found in process 2)
  • Process 4 = Perform the processing for each item in queue2

Wouldn’t this cause the robot to possibly open up the same excel file multiple times in Process 4, thus causing a lot of redundancy? Or if multiple robots are handling the items in the queue, wouldn’t there be a good chance that an excel file would already be opened by a different robot? I guess if it’s just grabbing the info out of it, that might not be a significance performance hit. What it is doing in the excel files in process 4 is getting the payment details for each matched payment & using that info to create a journal entry. The details of each matched payment can be anywhere from 1 line to a few thousand lines. However, once grabbing the payment details, the robot doesn’t need to use the excel file for that transaction anymore, so I suppose that opening the same excel file multiple times isn’t that big of a deal, correct?

There’s a very good chance I misunderstood at least part of your post, so I thought I’d check to make sure I followed what you were saying :slight_smile:


#4

Glad you found it useful :slight_smile:
Of course all this is just IMHO :wink:

Process 2 and 3 could be merged - setting up next queue is usually the role of the previous process (otherwise you’re missing a way to transfer data further up the chain).

So if I were you I’d end up with something along the lines of:
Process 1 = Dispatcher of Initial payments
Process 2 = Performer (matching) of Initial payments AND Dispatcher for Processing of each item
Process 3 = Perfomer of processing of each item

Regarding dual-role of Process 2:
I’d say treat adding to the next queue as part of the previous transaction, because otherwise you might end up with a “Successfull” transaction in queue1 and… nothing in queue2.

Either way key is to identify what is a Transaction in your process. And don’t limit yourself to 1 process = 1 transaction. Since most processes involve multiple steps, divide and conquer.
Even if it looks from a business user perspective that something is a single thing, in reality it rarely is.

Think of the classic example of using an ATM - it’s definitely 1 transaction from user perspective. And from usability perspective it has to fully succeed or be rolled back.
But inside of it (implementation details that users shouldn’t be concerned about) there are steps that are in itself transactions. Failure of each of those transactions will make the whole Transaction fail. But that doesn’t make them unimportant, or from this topics perspective - not useful. Quite the opposite.

Depends on how you look at it.
From one perspective, yes, same file could be opened multiple times.
On the other hand, these are separate transactions and each transaction should be ACID. So even though the same file might be read for multiple transactions, which might seem redundant, it’s done for separate transactions. And that separation is important.

But, you raise a good point:

The matching part is still a little fuzzy to me, but I’ll just make a couple of assumptions to move forward and hopefully my point will still bring you some value.

Depending on the volume of that data to search through (those last 30 excels), you might be able to cache it, although I’d be careful with that.

There’s also the relation of data.
If I understand you correctly:
1 initial payment will be found in exactly 1 file
1 initial payment might consist of multiple lines in that file
1 file can contain data for multiple payments

This is where you need to see (as in - measure) if that performance hit of searching same file multiple times is acceptable or not and decide (implementation detail!) what the best way to effectively represent those relationships and matching as a workflow.

For a start, I’d do a quick test - read and merge datatables from all the 30 files. Or better yet - a little more. Pick your peak month for it. Or fudge so that you know that if it peaks it still won’t crash.
If you can read all, keep it in memory and then match -> that will be easy.

If you can’t… cheat. Or to be precise - cut the corners by triming the data.
Keep just the thing that you use for matching and the filename and nothing more. If it’s still too much, remove duplicates (if there really can be a few thousand lines for 1 initial payment, that will cut memory usage significantly). Or actually remove duplicates anyway, they won’t be needed for this purpose.
That way you will know in which file that transaction data can be found. Not necessarily all the lines, but you went down from 30 files… to 1.
Now you could check just that one file and get concrete lines (or their numbers).

If you can’t do that either, you could take 1 transaction (initial payment) and search through all files, rinse and repeat for next ones.

What I’d definitely avoid (from ACID compliance perspective) is to take all transactions and search through individual files. That’s a huge risk since failure (as in - exception) at any point could (and probably would) impact other matchings.

Assuming you get the matching done one way or the other, from here you could either store that (filename or filename + line numbers, or whatever will be needed for actual processing) into the queue2 or…
Make new files. Storage is cheap and it probably won’t be GB (maybe a few MB?).

So you could (by whichever path you took to this point) get all the rows needed for that transaction and write them to a new file in full. Now you could store that file name/path in the queue2 with the database data and achieve full isolation of transactions - each queue item consists of data (or “links”/paths to data) that enable full processing for this transaction and for this transaction only.

Whatever will happen to that particular file (f.e. robot hangs and locks the file) will not impact any other one. And you can safely scale horizontally ( = more or less robots as needed) for Process 3 / queue2, and you can retry, as well as mock (as in - create dummies) for test purposes to your hearts content.
With that design (clear separations) you could prepare manual input data (just make a robot that fills it into queue2) and start developing from that part, because that’s where the business value lies.

Edit:
Wrt last sentence (about where the value is)… I might’ve jumped to conclusions a little bit. There’s definitely value in that part, but that doesn’t mean there isn’t any in other parts, or even more there - it all depends on what is the major pain point and if having a potentially shippable part with one of them will have value on it’s own. YMMV as they say.
/Edit

That got a little lengthy, didn’t it? :slight_smile:
Hope it makes sense.

So while my previous post focused on identifying process boundaries and per-process scaling needs , this one focuses on identifying what is the Transaction (and sub/mini transactions) in your process, how to leverage queues to achieve scaling and how to transfer and separate data.

Anything I missed?
If someone has other opinions, add on!
If you disagree - add on even more! Improving is fun :slight_smile:

PS. (Following advise from previous post Edit)
Don’t try to shoehorn your design of the process as I’m writing it here if you feel it’s suboptimal for you (for whatever reason - complexity, design time, maintenance, some constraints you can’t share etc.). All I’m trying to do is provide options and different perspective. After all, it’s your process :slight_smile:

PPS. But try to use queues. They make life much easier and I don’t think it’s controversial to say so :wink: