UiPath only able to read blocks of text in PDF instead of specific values

Robert_Schauer · October 23, 2019, 9:08pm

UiPath has the ability to read PDF text, however, in the PDFs I am working with the software is only able to recognize blocks of text as opposed to distinct values. Below is a screenshot of an example where I am trying to capture the Meter #, and the bot can only output a block of text indicated by the red box. Is there a better way to pull out specific values from a PDF like this instead of pulling in a block of text, and using regex to extract the values I want?

I would post file but new users don’t have the ability to post files yet.

Any help is greatly appriciated!

bcorrea · October 23, 2019, 9:40pm

You used the activity Read PDF Text or with OCR?

Robert_Schauer · October 23, 2019, 9:43pm

Hey @bcorrea!

I have used both methods, but the issue I am having is the PDF document is 70 or more pages long, so when I do the Read PDF Text, for example, the output is a string that includes every single page of the document. Additionally, I still need to pick out specific values so even if it is all in a text format, i would still need to use regex to pick it apart.

bcorrea · October 23, 2019, 9:48pm

If you know which page the text is, you can use Range property and set it there so only that will be read.

Robert_Schauer · October 23, 2019, 9:52pm

I should clarify the ultimate goal is to find specified text on each page of the document.

bcorrea · October 23, 2019, 9:56pm

ok, what i see in your screenshot is exactly what? the pdf itself? if your pdf is text based (you can in acrobat reader, search for like meter and it gets found), this should be easy, you just need to decide, do you need to know from each page each value was found on, or just need a collection of values?

Robert_Schauer · October 23, 2019, 10:10pm

The screen shot is showing a block of text UiPath is able to detect when I screen scrape. A full page looks like the following:

I would, for instance, be trying to find Meter # on each page of the 70+ page document.

bcorrea · October 24, 2019, 2:38pm

oh so your choice is scraping directly from the opened pdf in like adobe acrobat? So will be easy to get only the meter information, use Anchor base activity and simple get text and be happy.

Topic		Replies	Views
Text Extraction for PDF File Studio	4	1635	July 16, 2020
Unable to extract specific data from scanned pdf Help pdf , activities , question	6	1097	January 24, 2020
How to extract all pages of a PDF based on a specific Text? Studio studio	15	3141	May 15, 2020
PDF: Get text activity selecting entire page Activities pdf , activities , studio , question	19	2143	May 4, 2022
Extracting specific elements from scanned pdf's Academy Feedback studio	6	2914	April 8, 2019

Most Active Users - Yesterday
ruchirmahajan
parvathi_ayanala
Amit_Khyade
More details...

UiPath only able to read blocks of text in PDF instead of specific values

Related topics