UiPath only able to read blocks of text in PDF instead of specific values

UiPath has the ability to read PDF text, however, in the PDFs I am working with the software is only able to recognize blocks of text as opposed to distinct values. Below is a screenshot of an example where I am trying to capture the Meter #, and the bot can only output a block of text indicated by the red box. Is there a better way to pull out specific values from a PDF like this instead of pulling in a block of text, and using regex to extract the values I want?

I would post file but new users don’t have the ability to post files yet.

Any help is greatly appriciated!

You used the activity Read PDF Text or with OCR?

Hey @bcorrea!

I have used both methods, but the issue I am having is the PDF document is 70 or more pages long, so when I do the Read PDF Text, for example, the output is a string that includes every single page of the document. Additionally, I still need to pick out specific values so even if it is all in a text format, i would still need to use regex to pick it apart.

If you know which page the text is, you can use Range property and set it there so only that will be read.

I should clarify the ultimate goal is to find specified text on each page of the document.

ok, what i see in your screenshot is exactly what? the pdf itself? if your pdf is text based (you can in acrobat reader, search for like meter and it gets found), this should be easy, you just need to decide, do you need to know from each page each value was found on, or just need a collection of values?

The screen shot is showing a block of text UiPath is able to detect when I screen scrape. A full page looks like the following:

I would, for instance, be trying to find Meter # on each page of the 70+ page document.

oh so your choice is scraping directly from the opened pdf in like adobe acrobat? So will be easy to get only the meter information, use Anchor base activity and simple get text and be happy.

1 Like