Facing issues with document understanding- beginner

kelly07.19 · May 22, 2021, 11:34am

Hi , so currently i am working with a project that requires me to rename files from mainly four different document types. Let say the main document types are A,B,C,D. I have to extract Company Name , Date , Document type and title and rename files with this extracted elements. However something that is troubling me is that my document types are imaged based and could be of bad quality, how should i go about doing this? Furthermore under each main document types of A,B,C,D contains many other files that sometime deviate from each other. How doable is this project ?

RPAForEveryone · May 22, 2021, 3:47pm

Hi @kelly07.19
Welcome to the UiPath Community!

UiPath Document Understanding provides a sophisticated system of document classification and data extraction. While the image quality is an important factor in successful classification/extraction, as long as there is reasonable structure to the documents, this project should be doable. Having said that, it might require advanced knowledge of the document understanding module, if the project gets more complex.

I would recommend to start small, train a contained set of documents first. This will help with two things:

You will find out if the quality of the documents you’re currently using is good enough for document understanding to work accurately.
It will give you (i.e. the developer, if it’s someone else in your team) an initial taste of the document understanding module.

One more thing to consider:
In most cases, applying 80-20 rule of thumb may help. Most likely, not all the documents or variations of documents will be received by the process in equal proportion.
Focus on the most used templates, which will maximise the automation and minimise manual handling. If you can cover the top 80% of document workload, it’s likely that the last 20% will fall into “effort-outweighs-the-benefit” category. If not, you can always add more tempaltes/variations as you become more proficient with the tool.

I hope this helps. Sorry for the long post.

kelly07.19 · May 23, 2021, 5:09am

Thank you sir , have a great day

Evelyn_Lim · May 23, 2021, 5:33am

Hello there, which ocr engine would u recommend to use to scan the image based pdf?

Pradeep.Robot · September 18, 2021, 6:51am

Go for UiPath Document OCR (if you prefer licensed), if not Omnipage would be equally good.

sonaliaggarwal47 · September 18, 2021, 12:47pm

Hi @Evelyn_Lim @bhupender_Rawat,

Different OCR engines can serve better for different types of documents.

Its always good to give a try with another engine if one doesn’t seem to extract details to the desired accuracy.

Additionally, I am sharing below OCR accuracy matrix which was shared in UiPath devcon, this would give you a good idea on different OCR engines working against document types.

Hope this helps.

Regards
Sonali

Ravis · October 14, 2022, 3:46pm

Facing this issue when we click on Taxonomy.
(I have already install .Net )
What is the solution of this error

After click on yes,get this msg

Topic		Replies	Views
Document Understandng Studio studio , question	4	1142	April 15, 2021
Rename Document Based on Document Type Document Understanding classify-scope	2	809	July 25, 2022
Step by step document understanding creation Document Understanding question	8	74	December 10, 2025
Some questions regarding document understanding AI Center question , ai_center	3	80	October 3, 2024
How can I extract the several document with different style/format Document Understanding	0	779	July 1, 2020

Facing issues with document understanding- beginner

Related topics