'Read PDF With OCR' is not reading line by line from PDF

Hi,

I am extracting text from image in pdf with ‘Read PDF with ocr’ activity with ‘UiPath Extended Languages OCR’ engine.

I am getting below result
456
123
789

But, I am expecting read line by line in pdf and looking to get output as
123
456
789

Appreciate any help.

Method 1: Configuring OCR Engine Settings

1. Scale setting has been adjusted in the properties panel.
• The scale value usually changes to 1.0, but some may use higher values ​​such as 1.5 or 2.0, which can help it read more accurately.
2. Inversion option option.
• If the background color of the text in the PDF is dark, you can enable the Invert option so that the OCR engine detects the text better.

Method 2: Replace OCR Engine

• Apart from Extended Languages ​​OCR, try Tesseract OCR, Microsoft OCR or Google OCR engines.
• For each OCR engine to run differently, it is possible to sequence some copies more accurately with another engine.

Instead of using Read PDF with OCR use Digitize activity from IntelligentOCR package.

It provides a string output much more similar to the document layout

1 Like

Tried Digitize as well, didn’t work. Thanks.

Tried different free OCRs. ‘UiPath Extended Languages OCR’ is most suitable for my requirement, but it is not giving sequential line by line output. Thanks

@chandrasekhar.jella

If you want to use read PDF Text, It can be help to you.It will not change format

Hi @chandrasekhar.jella

If none of them work and you can’t use the paid feature, you can try using the traditional method:
1-) Open the PDF and grab all the text with Ctrl+A and set it to the clipboard (Ctrl+C)
2-) Use the Get From Clipboard activity

You can get the text this way and check it. I can’t say it’s a definitive solution, but I’ve had this problem before and it was solved with this method :slight_smile: