Not able to extract data from pdf

Hi,

I have 3 pdf’s which has a particular number to be extracted. For all three PDF at same place the number is there. But when I am trying to use read pdf ocr, the output is different. I already used Omni page ocr and Teseract ocr both are not working. I cannot use document understanding, the ocr’s which are having api key. Is there any other solution to extract the number.

Thanks

Hi @nagini.pragna,

Did u allready adjusted the scale option from OCR engine (Tesseract for example):
2022-10-15 01_03_14-Window

I got issues aswell with reading some numbers from a website, but when I raised the scale option to 3, it was working properly.

Note: If you are able to select the number as text, you can also use the Read PDF Text activity.

Hope this helps,
Robert

Hi @nagini.pragna please try with Microsoft ocr and set its scale value to 3.

Can you help me with the package name. I am not able to find

Thanks ReadPDFtext worked

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.