Only part of the PDF file is read

I’m trying to read an PDF file, but when using “Read PDF text” and “Read PDF with OCR”, I don’t get the full text. Part of the text is missing.
When I open a file in a PDF reader, I also cannot select that part of the text that is missing when reading a file with “Read PDF text”.
I would be grateful for any ideas for solving the problem.

@Lieben
If you can’t select the text on the pdf then that usually means that the text is stored on the pdf as an image, which is why Read PDF Text is not returning it. The Read PDF With OCR activities should work on images, though. If they are not returning text you may need to tweak the settings to help the bot. Additionally, if the text is rotated in any way the bot will have a difficult time reading it.

1 Like

@DanielMitchell
There, the text is of good quality, visually no different from what is recognized. Could you tell me what settings should be used to try to improve recognition?

@Lieben Is it possible to share the pdf so it will be easy to others to give a solution.

@indra
Problem with retrieving “invoice number” and "date of issue"1.pdf (188.0 КБ)

@Lieben You can use Read Pdf With OCR activity so you can get invoice number and date of issue

@indra
Thanks, I think the first time I chose an unsuitable OCR engine.

2 Likes

@Lieben I have used google ocr so getting result.

2 Likes