Read PDF with OCR only reads couple of pages or last page

Hi,

I’m trying to read the text from a pdf using the uipath.pdf.activities read pdf with ocr and the microsoft engine. The problem is that it seems to only read the last page or the first 5 or 6 pages. Because I’m reading multiple files with different amount of pages I can not specify the range. I saw some other topics with similar issues, but no solutions.
image
I have tried range “1-end”, “All”, “1-11”, “1-18”, “1-”+Pagecount.tostring
All give the same not complete result.
I do notice some faulted microsoft ocr acivities in my output, but no exception messages


My microsoft ocr engine properties:
image
My workflow:

I know I could try to work around this by reading each page separate in a loop and the adding the text together, but I would prefer to have this working as it is supposed to be.
Thank you for your help

Hi @Barend,
Have you tried to experiment with other engines, changing dpi etc?

Hi,

Any updates?

I have a similar scenario but with OCR Framework, is reading 6 first pages only…

By the way for me i fixed the issue where it only reads the first page by changing the place i read the text from the OCR engine to the read pdf from OCR activity

Hi @Barend,

Have you tried using any other OCR engines apart from microsoft one? I think it would be a good idea to try out omnipage, google ocr’s etc and then seeing the results as not all engines would be able to successfully OCR everything from the documents, it depends on multiple factors. So its always good to try out different engines to see any change in the outputs.

We have used Read pdf with ocr activity for n number of pages in pdf and it worked for all of those.

Regards
Sonali

Hi, I think I fixed it by using a different engine, but it has been a long time ago. So, I don’t remember the exact sollution

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.