Not able to extract data from pdf

nagini.pragna · October 14, 2022, 5:47pm

Hi,

I have 3 pdf’s which has a particular number to be extracted. For all three PDF at same place the number is there. But when I am trying to use read pdf ocr, the output is different. I already used Omni page ocr and Teseract ocr both are not working. I cannot use document understanding, the ocr’s which are having api key. Is there any other solution to extract the number.

Thanks

Robert_Lansbergen · October 14, 2022, 11:11pm

Hi @nagini.pragna,

Did u allready adjusted the scale option from OCR engine (Tesseract for example):
2022-10-15 01_03_14-Window

I got issues aswell with reading some numbers from a website, but when I raised the scale option to 3, it was working properly.

Note: If you are able to select the number as text, you can also use the Read PDF Text activity.

Hope this helps,
Robert

Tapan_Behera1 · October 15, 2022, 2:50am

Hi @nagini.pragna please try with Microsoft ocr and set its scale value to 3.

nagini.pragna · October 16, 2022, 6:41am

Can you help me with the package name. I am not able to find

nagini.pragna · October 16, 2022, 6:57am

Thanks ReadPDFtext worked

Topic		Replies	Views
Read PDF With OCR (Tesseract OCR) Studio studio , question , activities_panel	19	1956	August 14, 2023
Image type PDF Studio studio , question , activities_panel	10	322	February 28, 2024
To make sharpen better of scanned pdf Studio studio , question , activities_panel	13	696	August 18, 2023
Extracting the data from image based pdf Help pdf , ocr , activities	4	1010	March 20, 2020
Pdf get ocr text format changing Help	8	1375	June 14, 2019

Not able to extract data from pdf

Related topics