Document Understandng

Sakshi_Jain · April 12, 2021, 5:04am

Anyone has sample project to understand document understanding with sample files ?
Thanks in advance

NIVED_NAMBIAR · April 12, 2021, 6:19am

Hi @Sakshi_Jain

Did u check out some video in YouTube regarding document understanding?

You can check with @Lahiru.Fernando he can help you

Adrian_Star · April 12, 2021, 10:52am

Hi, go to:

Then search for: UiPath Document Understanding Overview

There, during the course, a sample DU workflow is created / can be downloaded.

Sakshi_Jain · April 15, 2021, 10:57am

@Adrian_Star @NIVED_NAMBIAR Yes did watch some videos and tried on system

How do we decide which type of activities we need use for these
load taxonomy,digitize document ,classify,extract,validate,export

Different types of OCR :tesrat,omnipade,microsoft,uipath,google
text extractor : regex based extrator, form extrator, intelligent form extrator, Ml extrators for invoice, receipts, purchase orders etc
Classifier : keyowrd based …

also it was not working for set of images i was trying to get the data

can see the code of the activities we are doing , like if i want same automation through python anyway i can get the code ?

Adrian_Star · April 15, 2021, 11:30am

The taxonomy is used to recognize the documents and data you want to extract and must always be prepared for each document that is subject to the process.

Digitization is the conversion of a scan to a digital version of a document. An attempt is made to read characters and assign their positions in the digital version of the document, so that, for example, at the validation station, it is possible to indicate the position, and the data extraction process understands it.

The different types of data extraction depend on the degree of precision you want to achieve.
I mean, are you satisfied with the 60% reading efficiency or maybe 90% +?

If the value you want is consistently between some text and the document being read to an editable PDF then you can use REGEX.
If you are dealing with a scan, you are using OCR engines.
OCR engines have different efficiency and possibilities (scaling, rotating). It is important that they support the language for which you want to read the data.

There are OCR engines that automatically rotate a skewed image, others don’t.

If you have a Document Understanding server, then you can train the ML model to improve its effectiveness. Without a learning model, performance may vary depending on the quality of the document.

As for the UiPath source code for learning, it is probably not generally available to the Developer from the flow level.

Topic		Replies	Views
Facing issues with document understanding- beginner Document Understanding	6	2115	October 14, 2022
What is diffrent between document understanding and local netive ocr Activities excel , ocr , activities , studio , document_understanding , tesseract-ocr	4	1911	July 15, 2022
What uipath packages are used to extract data from photographed or scanned invoices? Activities ocr , activities , abbyy , question , document_understanding , intelligent_ocr , omnipage , tesseract-ocr , ocr-engine , abbyy-flexicapture , google-ocr	3	810	May 6, 2022
What is the best way to handle Document Processing and Document Understanding? Activities activities , question , document_understanding	1	638	November 30, 2022
Data Extraction From Scanned PDF'S Help activities , question	7	2322	November 2, 2020

Most Active Users - Yesterday
ashokkarale
mkankatala
Parvathy
vrdabberu
sandyarpa767
pravallikapaluri
gantamohan502
indiedev91
naveen.s
Anil_G
More details...

Document Understandng

Related Topics