UiPath has the ability to read PDF text, however, in the PDFs I am working with the software is only able to recognize blocks of text as opposed to distinct values. Below is a screenshot of an example where I am trying to capture the Meter #, and the bot can only output a block of text indicated by the red box. Is there a better way to pull out specific values from a PDF like this instead of pulling in a block of text, and using regex to extract the values I want?
I would post file but new users don’t have the ability to post files yet.
I have used both methods, but the issue I am having is the PDF document is 70 or more pages long, so when I do the Read PDF Text, for example, the output is a string that includes every single page of the document. Additionally, I still need to pick out specific values so even if it is all in a text format, i would still need to use regex to pick it apart.
ok, what i see in your screenshot is exactly what? the pdf itself? if your pdf is text based (you can in acrobat reader, search for like meter and it gets found), this should be easy, you just need to decide, do you need to know from each page each value was found on, or just need a collection of values?
oh so your choice is scraping directly from the opened pdf in like adobe acrobat? So will be easy to get only the meter information, use Anchor base activity and simple get text and be happy.