For the new document understanding feature why would I use OCR for Native PDFs

I have gone through many videos and webinars about document understanding with different extractors like regex, forms and ML. But I see that the text extracted from PDF files is through OCRs like OmniOCR. Is it mandatory to use OCR. Or is it that UiPath extracts based on the pdf file if it is scanned then uses OCR and if native it extracts the text through Get PDF Text etc?

Hello @birinder,

It is mandatory to put an OCR activity in the Digitize Document activity, but it DOES NOT GET USED unless the Digitize Document decides it cannot reliably natively read certain pages from an incoming PDF.

So the OCR engine is mandatory, but its usage depends on the incoming document and native PDFs do not trigger the OCR engine except in very specific situations.

Ioana

1 Like