PDF Text

Hello,

I have big troubles solving a PDF problem.

I have a PDF document that always has the same design/format but can be 1 up to 3 pages.

Now I need some information out of the PDF, but if I use “Read PDF text” activity I get the full text in completely wrong order and its not really usable (because the number of words can change too).

Also, Screen Scraping doesn’t work because there can be more than 1 page, but for example if there is a second page that page is not visible for scraping.

I also tried “get text” activity, but the fields I can select are messed up too and not in a correct order.

I don’t know how to solve this problem…

If someone has an idea, I would be really thankful.

Thanks in advance and kind regards
Tobias

Have you tried Read PDF With OCR?

1 Like

Another option is to read via OCR -

Try Get OCR Text or Get Full Text activity on the open PDF and see what output you get on it.

I only can use tesseract OCR somehow, and I get bad results. Also, get full text gives same result as read pdf text activity. It is not really useable.

Try to configure the engine to the right language. Please see OCR languages and check if it returns better results.

Can you attach the PDF for us to have a look at?
Also provide what information in the PDF you require

I cannot share the PDF because it contains sensitive data :confused: