Dev Dives: Mastering AI-powered Document Understanding

Hi UiPath Community,

Are you eager to supercharge your document processing with the powerful duo of automation and AI? If so, mark your calendars for September 28 because you won’t want to miss the next Dev Dives episode!

I’m Lahiru Fernando, Country Director/ RPA Lead (APAC)/ Practice Lead for Document Understanding & AI at Boundaryless Group and UiPath MVP. I’m excited about the upcoming Dev Dives session, where I’ll be sharing my insights and experiences with Document Understanding.

Together with Daniel Lerner, AI/ML Solution Architect at UiPath, we’ll guide you through the best practices of Document Understanding, share learnings from successful customer deployments, reveal how to measure business results effectively and give you a sneak peek into what’s on the horizon for Document Understanding.

By the end of the session, you will be able to use Document Understanding like a pro!

So make sure to join us (online) on September 28. Save your seat following the links below:

EMEA/APAC: Thu, Sep 28, 3:00PM (CEST)

AMER: Thu, Sep 28, 9:00AM (EDT)

Can’t wait to see you there!

14 Likes

HTML,CSS,PYTHON,WEB UiPath

Hi @Lahiru.Fernando ,

I’ve been following you on YouTube and your in-depth tutorials have helped me a lot with my DU projects and I’m incredibly grateful for your contributions to the community.

Its good to see that the Document Understanding module is receiving continuous improvements, but I still have some questions that I hope you may address either here or during the session.
How many layouts/batches can a single Document Understanding Model accommodate to reliably extract data?
In our current project, we doesn’t have an estimate on the types of layouts(there are way too many), but we’ve managed to narrow it down to 27 layouts. Currently, we’ve deployed a single ML Model for the 27 batches, and the extraction results aren’t too good.

Also, its difficult to obtain 200 documents for all 27 layouts(i.e.,5400 docs in total), so that is another challenge.

Are there any suitable pre-processing steps we can adopt to improve the OCR performance?
Receipts always give me a headache because the text isn’t always clear(well it is to me but not to the OCR). It might be in various orientations and this affects the Model’s ability to detect and group tokens.

Can Unstructured Documents or multiple documents on single page be handled with NER & Document Understanding?
I understand that DU can only work with semi-structured documents, but we’ve had scenarios where the client/customer submits multiple documents on a single page, and sometimes a single document consisting of multiple logical sub documents e.g, a 10 page document that consists of 5 invoices. These are unstructured and have started to increase in number.

I have a LOT of questions but decided to stop here.
I’d really appreciate it if you could address some of these questions either here or during the session.
Thanks in advance!

Kind Regards,
Ashwin A.K

3 Likes

Hello @ashwin.ashok

Sorry for not being quick in responding. I just saw your post. Here are my answers…

How many layouts/batches can a single Document Understanding Model accommodate to reliably extract data?

As far as I know, there is no such limitation in the number of batches a single model can handle. We also had several use cases where we had more than 200+ vendors sending invoices. This means 200+ layouts. Our approach for this is:

  • Identify the most frequent layouts
  • Collect sample documents from each layout (making sure we have enough samples from each layout starting with 20)
  • Doing the training

Since you mentioned the extraction results are not so good in your case, I highly recommend you do an Evaluation of the model to identify where it lacks in terms of training. This will require you to run an evaluation pipeline.

Are there any suitable pre-processing steps we can adopt to improve the OCR performance?

Yes, there are different methods we could do depending on the scenarios we encounter. One of the pre-processing steps we do is to convert the document into gray-scale and increase the contrast where required. This could help clear the black text for the OCR a bit. In addition, also try to apply some standards on how people send the documents to you. This goes as a part of process improvement and standardization for better accuracy.

Can Unstructured Documents or multiple documents on single page be handled with NER & Document Understanding?

Yes. We can connect Document Understanding with NER. I have done this in another project for some unstructured documents. The Digitize Document activity gives you the Document Text. This goes as the input for the NER model. Depending on the scenario, you may need to do a little bit of cleansing before submitting the text to NER. Example: removing extra line breaks, special characters etc.

It can be tricky when you have multiple documents on a single page. However, if these are added as Images, we can use PDF activities to get the images extracted into a list and process them separately using the new Document Understanding activity pack. If a single file has multiple documents spread across different pages, you can try applying an Intelligent Keyword Classifier to split those documents. The output of this classification can be used to process each identified document separately and extract the data.

I hope this helps. Im sure you have a lot of questions, and may be more based on my reply. Feel free to connect with me so I can help you.

Check out my Dev Dives Follow Up video as well:

2 Likes

Thank you for your detailed response @Lahiru.Fernando , I appreciate it!

How many layouts/batches can a single Document Understanding Model Accommodate to reliably extract data?

UiPath recommends using a model per parent layout i.e., if we had 4 different layouts for a receipt, 10 for prescription and 3 for ledger then we had to train them separately with appropriate Taxonomy.
However on further questioning, UiPath support couldn’t comment on whether this will give us the results we were expecting(customer was expecting numbers as in expected % improvement on adopting this new strategy).
Also the dataset was uneven since we couldn’t find enough sample documents for a given and excluding them was not an option either as the customer deemed them important, which leads me to this question:

If I have 10 layouts with 20-40 training documents for the first 7 and 150-200 documents for the last 3, should I subtract documents from the last 3 to ensure that I have a balanced dataset if all layouts have to be considered? I understand that the document count is low, but there is a risk of overfitting which will affect its performance.

Are there any suitable pre-processing steps we can adopt to improve OCR performance?

Converting the image to Grayscale could be an option, I might try this. We are using UiPath Document Understanding OCR btw.

Can Unstructured Document or multiple documents on single page be handled with NER & Document Understanding?

This question was more of a follow up on the video series you had started on NER with DU and I’m looking forward to what you have in store for us!

Kind Regards,
Ashwin A.K