OCR on a PDF: Targeting using Coordinates not accurate

That might be a poorly worded title so to detail the problem a little more.

I have a large amount of PDFs all in the same format that I wish to read and output their information in a CSV. My sequence opens the PDF in PDF-XChange viewer.

So for example, a chunk of the PDF looks like the below:

I use the recording mode to try and screen scrape, as seen below:

Using the google OCR, this works fine:

But when I run it in my sequence. It doesn’t return the same value that it shows above. It seems to scrape a similar sized box but a few pixels below, aka:

Should I be approaching this differently?

I’ve put some screenshots in an album as I cannot attach more than one image as a new user. They seem to have got out of order:

Hello,

there is something about it here

the user needs to read the pdf and extract the invoice amount. there is a XAML file, can you please try that one out and let us know your findings?

regards…

1 Like