Instead of the Microsoft OCR engine

mini9301 · March 4, 2024, 6:28am

I am trying to extract text from scanned pdf document.

I know that we had the Microsoft OCR engine to extract that for last version.

What can I use instead of the Microsoft OCR engine?

I have engines you can see on an image below.

pravallikapaluri · March 4, 2024, 6:29am

Hi @mini9301
Try using tesserract or uipathdocument ocr

In the tessarract change the scaling and profile and check the extracted output as per your requirement.

Hope it helps!!

mkankatala · March 4, 2024, 6:29am

Hi @mini9301

You can use the Tesseract OCR. This OCR will help you to extract the data from the scanned pdf’s and select the scanned option in the Profile dropdown.

If the data is not extracting properly, open the properties there is a option called scale. Change the scale values from 0.1 to 5 until you get the extracted data properly.

Check the below image where you can change,

Hope it helps!!

tazunnisa.badavide · March 4, 2024, 6:29am

you can use UiPath document ocr or the Teserract OCR.

Gayathri_Mk · March 4, 2024, 7:14am

Use Tesseract OCR for extraction of data from scanned documents

mini9301 · March 5, 2024, 4:30am

Since the Tesseract OCR requires Image as the input value,

Do I have to change the scanned pdf file to image file?

How should I do in this case?

mkankatala · March 5, 2024, 4:41am

Not required to give any input to Tesseract OCR. Use the Read Pdf with OCR activity, inside of this use Tesseract OCR. You can provide the Path of the Pdf file as the input to the Read Pdf with OCR activity.

Check the below image for better understanding,

Read Pdf with OCR uses the Tesseract OCR to read the scanned or unstructured pdf’s.

Hope you understand!!

mini9301 · March 5, 2024, 4:58am

Wow, It worked.

Thank you so much!

mkankatala · March 5, 2024, 5:00am

It’s my pleasure… @mini9301

Happy Automation!!

system · March 8, 2024, 5:00am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best activity for extract text from pdf Activities pdf , activities , studio	3	119	June 4, 2024
Reading a scanned file PDF (instead of reading PDF) Studio studio , question , activities_panel	8	736	June 17, 2022
Read Scanned pdf with OCR Help	1	1203	July 26, 2020
Data Extraction From Scanned PDF'S Help activities , question	7	2733	November 2, 2020
Extract Text from Scanned Document Video Tutorials ocr	0	990	December 19, 2021

Instead of the Microsoft OCR engine

Related topics