For the new document understanding feature why would I use OCR for Native PDFs

birinder · May 27, 2020, 10:02am

I have gone through many videos and webinars about document understanding with different extractors like regex, forms and ML. But I see that the text extracted from PDF files is through OCRs like OmniOCR. Is it mandatory to use OCR. Or is it that UiPath extracts based on the pdf file if it is scanned then uses OCR and if native it extracts the text through Get PDF Text etc?

Ioana_Gligan · June 18, 2020, 4:22pm

Hello @birinder,

It is mandatory to put an OCR activity in the Digitize Document activity, but it DOES NOT GET USED unless the Digitize Document decides it cannot reliably natively read certain pages from an incoming PDF.

So the OCR engine is mandatory, but its usage depends on the incoming document and native PDFs do not trigger the OCR engine except in very specific situations.

Ioana

Topic		Replies	Views
Different results reading a Native PDF File and Scanned PDF File with the same OCR Activities activities , question , document_understanding	2	1625	March 6, 2022
Document Understanding – Digitize Document – Native PDF inaccuracies Document Understanding	6	1626	April 18, 2022
How to train data for read OCR? Activities ocr , activities , question	1	886	August 9, 2021
Extract Text from Scanned Document Video Tutorials ocr	0	854	December 19, 2021
Available Intelligent Automation APIs Studio uiautomation	4	666	April 6, 2021

Most Active Users - Yesterday
ashokkarale
lrtetala
postwick
Julian_Muhlbauer
Yoichi
Ajay_Mishra
Anonymouss
vrdabberu
ABHIMANYU_THITE1
anjani_priya
More details...

For the new document understanding feature why would I use OCR for Native PDFs

Related Topics