Document Understanding Best Practises / REFramwork

Dear Community,

I’m currently trying to build a template for the whole document understanding process / life cycle and came across a few problems regarding best practises when combining document understanding with the REFramework.

This is the basic process of document understanding according to UiPath:

In my case, I’ve added a 3.5 as well, which is another validation station as in step 5, for documents where the classifier is below a certain confidence level.

So my process basically looks like this:

My big question is, how to split up the process in order to maximize parallelism. I cannot wait with the process until a user has finished their classification or extraction tasks because I might have to extract data from hundreds or thousands of documents.

So I was thinking about using queues between a lot of the steps and having several different bots (or rather processes).

Example: Once the document is classified, the classification results get put into a queue for further processing. This happens when either the bot or a person does the classification. Same for the data extraction, results get put into a queue for further processing. I’ll probably also need queues for retraining the models.

This would mean I’d have to split my process into 3 different parts connected by queues: Digitizing + Classification, Data Extraction and further processing of the extracted data.

Each of those parts would then use the REFramework.


Another idea would be to create an Orchestration Process and run this for every single file which needs to be processed. This is definitely easier to implement but there might be a time with hundreds or more processes waiting for user input and I’m not sure how this is handled.


The first approach is definitely scaleable with simply adding more robots, not sure about the second way.

I’d be great if we could discuss those approaches or you could even share your own way of tackling the whole document understanding process.

  • T0Bi
1 Like

Hello @T0Bi!

It seems that you have trouble getting an answer to your question in the first 24 hours.
Let us give you a few hints and helpful links.

First, make sure you browsed through our Forum FAQ Beginner’s Guide. It will teach you what should be included in your topic.

You can check out some of our resources directly, see below:

  1. Always search first. It is the best way to quickly find your answer. Check out the image icon for that.
    Clicking the options button will let you set more specific topic search filters, i.e. only the ones with a solution.

  2. Topic that contains most common solutions with example project files can be found here.

  3. Read our official documentation where you can find a lot of information and instructions about each of our products:

  4. Watch the videos on our official YouTube channel for more visual tutorials.

  5. Meet us and our users on our Community Slack and ask your question there.

Hopefully this will let you easily find the solution/information you need. Once you have it, we would be happy if you could share your findings here and mark it as a solution. This will help other users find it in the future.

Thank you for helping us build our UiPath Community!

Cheers from your friendly
Forum_Staff

Hi @T0Bi!

Your observations are absolutely spot on!

We are currently exploring both approaches you mentioned:

  1. Splitting the processing flow into smaller sub-processes that pass data to each other through Queues. Orchestration Processes will do just fine, you don’t necessarily need the REF as long as you take care of exceptions & retry mechanisms.
  2. Using an Orchestration Process to process an input file end-to-end

In the first scenario, make sure you don’t leave any queue items as In Progress when a suspension point is reached. Otherwise, if the action is not completed within 24h, the item would get Abandoned.

In the second scenario, a dispatcher process simply starts a job for every input file. Scaling it is as simple as adding more robots to the “processing pool” (environment or modern folder, depending on the case).

Just for awareness, without any promises on availability: we are working on an out-of-the box Studio RPA template that would implement the logging, error handling & retry mechanisms specific for Document Understanding processes.

Cheers,
Alex.

Hi @Alexandru-Luca

thanks for your answer!

I haven’t thought about leaving queue items as In Progress will lead to them getting “Abandoned” after a while, that’s a really good point.

For now I’ll stick with a dispatcher and an Orchestration Process, it seems like the easiest way.

I’m looking forward to seeing such a template, I think it would make the Document Understanding process faster and easier to implement.

Cheers,
T0Bi