Facing issues with document understanding- beginner

Hi , so currently i am working with a project that requires me to rename files from mainly four different document types. Let say the main document types are A,B,C,D. I have to extract Company Name , Date , Document type and title and rename files with this extracted elements. However something that is troubling me is that my document types are imaged based and could be of bad quality, how should i go about doing this? Furthermore under each main document types of A,B,C,D contains many other files that sometime deviate from each other. How doable is this project ?

Hi @kelly07.19
Welcome to the UiPath Community!

UiPath Document Understanding provides a sophisticated system of document classification and data extraction. While the image quality is an important factor in successful classification/extraction, as long as there is reasonable structure to the documents, this project should be doable. Having said that, it might require advanced knowledge of the document understanding module, if the project gets more complex.

I would recommend to start small, train a contained set of documents first. This will help with two things:

  1. You will find out if the quality of the documents you’re currently using is good enough for document understanding to work accurately.
  2. It will give you (i.e. the developer, if it’s someone else in your team) an initial taste of the document understanding module.

One more thing to consider:
In most cases, applying 80-20 rule of thumb may help. Most likely, not all the documents or variations of documents will be received by the process in equal proportion.
Focus on the most used templates, which will maximise the automation and minimise manual handling. If you can cover the top 80% of document workload, it’s likely that the last 20% will fall into “effort-outweighs-the-benefit” category. If not, you can always add more tempaltes/variations as you become more proficient with the tool.

I hope this helps. Sorry for the long post.

Thank you sir , have a great day

Hello there, which ocr engine would u recommend to use to scan the image based pdf?

Go for UiPath Document OCR (if you prefer licensed), if not Omnipage would be equally good.

Hi @Evelyn_Lim @bhupender_Rawat,

Different OCR engines can serve better for different types of documents.

Its always good to give a try with another engine if one doesn’t seem to extract details to the desired accuracy.

Additionally, I am sharing below OCR accuracy matrix which was shared in UiPath devcon, this would give you a good idea on different OCR engines working against document types.

Hope this helps.


1 Like

Facing this issue when we click on Taxonomy.
(I have already install .Net )
What is the solution of this error

After click on yes,get this msg