Extracting Specific PDF Elements (Anchor Base/Find Element only selects Entire Page)

ds56 · December 23, 2018, 5:33pm

ADP Master Control Sample.pdf (449.4 KB)

Here is a sample of a pdf I’m trying to extract data from. I’m trying to get the 401k Contribution per Person for the entire document. (Starts on page 8, right side). I want to select the Y 401k for find element and the number to the left for the contribution amount with get text. However when I try and select the Y 401k for find element, the entire page is selected.

Does anyone think getting the 401k contribution per person is a task UIPath can do with this type of PDF? Or have any ways to grab just the specific element and not the entire page? I’ve also tried this with find image but it gives me gibberish results.

ANSHUL · January 8, 2019, 10:22am

you can do these steps:

Open web recording
go to Text>Scarpe> Scrape Relative
Select the text “Y 401K” and click “indicate” to indicate relative region, and select the numeric part on left.
UIPath selects the “Google ocr” method. update the settings as below:

image984×452 62.3 KB
it will extract the numeric value.

ds56 · January 15, 2019, 4:12pm

This works perfectly for one occurrence. Thank you!

However I’m having trouble figuring out how to get it to find this for all occurrences on all pages. I can only get it to grab 1. It will need to go through every page and get them all (which is sometimes 100s of pages).

Is this possible? It would help me SO much to get this working. Thank you

ANSHUL · January 16, 2019, 2:08am

hi @ds56,

You can send hot key Ctr + F, then type text “y 401 k”

and press find until the “Element Exist” activity finds:

So you will know you have to look the scrapping logic for 30 times.

Hope that solves it.

ds56 · January 17, 2019, 6:49pm

Thanks! I created a do while loop to loop through the searches and screen scraping. Unfortunately I found that the scrape relative worked well with a couple of them but most of the google ocr returns are either the wrong number, a number that doesn’t exist on the sheet, or missing the first part of the number.

It looks like theres at least an 80% chance of errors so it doesn’t look like UI Path won’t be able to help with this data extraction.

Unless you have another idea?

Rupendankhara · June 1, 2019, 10:28pm

Anyone, please look into this issue. Appreciate it

Topic		Replies	Views
I have a problem with getting an element from a pdf file Help activities	11	2026	June 4, 2019
Read Specific Data From PDF Help	19	2220	September 24, 2019
Anchor Base with PDF unable to extract text / unable to select PDF elements Help	0	1413	December 7, 2018
Retrievinbg information from a acanned pdf Help activities , data_scraping	19	2104	August 18, 2017
While extract specific data from pdf using anchor base(get text). I am getting the error message: The specified combination of selector, filter and scope is not supported(check the attachement) Activities pdf , activities , anchor-base	7	915	May 13, 2022

Most Active Users - Yesterday
ashokkarale
ppr
Anil_G
Ajay_Mishra
Yoichi
mhaniff
Shiva_Nikhil
Anonymouss
quick_123
vrdabberu
More details...

Extracting Specific PDF Elements (Anchor Base/Find Element only selects Entire Page)

Related Topics