Reading PDF and extracting specific text using Anchor Base

mc00476004 · January 1, 2020, 9:07am

Hi team,

I am trying to use the Anchor Base activity to Extract specific data from PDF and store it in a variable, but this does not work as the Output value is always null.

Can someone please have a look at this Project and help me fix this, please?PDF_data extraction _to_Excel.zip (291.2 KB)

Palaniyappan · January 1, 2020, 9:19am

Hi
Is pdf is opened in foreground to fetch the data we want with Anchor base activity
For that we need to use Start process activity
Cheers @mc00476004

mc00476004 · January 1, 2020, 9:21am

Hi,
Yes,Start Process activity has been used and all the actions mentioned in the sequence are being performed. Please see attached my project file PDF_data extraction _to_Excel.zip (291.2 KB)

Palaniyappan · January 1, 2020, 9:57am

Fine
i saw your workflow
may i know why we do these send hot keys
we can use SEND HOT KEY activity with down key or pgdn key and then get the word we want with ANCHOR BASE activty

Cheers @mc00476004

mc00476004 · January 1, 2020, 10:01am

I used the send hot keys to set the page size to 100% and to start from page 1 by pgup and set the reading mode in Adobe to tagged or infer from doc mode which all works fine.

The only issue is that the anchor base activitiy is not working or unable to get the output.

Manish540 · January 1, 2020, 10:13am

In Find Element activity, remove the idx from the selectors.
Choose some other type selectors like parentid, aaname, tag, innertext.

mc00476004 · January 1, 2020, 10:15am

Hi Manish, i tried doing that by keeping the selector very simple removing idx and it also got validated, but still no output. I am using Adobe reader xi, is that causing any problems. Is this working in on your machine?

Palaniyappan · January 1, 2020, 10:24am

Hi
instead of get text activity use HIGHLIGHT activity and lets check whether that term is getting highlighted or not
or
did we try with SCREEN SCRAPPING method

@mc00476004

mc00476004 · January 1, 2020, 12:25pm

Hi, i used highlight and strangely the text that i wanted to be highlighted is not working, instead an empty blank on the PDF is getting highlighted. That’s why i see blank string in the output. I am using Adobe Acrobat reader DC now, not sure why it’s behaving this way.

Have not tried screenscrapping yet, will try and let you know.

Thanks

mc00476004 · January 2, 2020, 2:13pm

Thank you ! I used full text in screenscrapping and used regex and string manipulation as the specific data is not working for me either using anchor base or screen scraping.

I have a question related to screen scraping using full text, when i use this at one go i am able to extract data only from one page, is there any way we can make sure that all the pages are extracted. I don’t want to use read pdf activity as it is not recognising the table structure and new lines and make it difficult to use reg ex.

naveen_kumar · April 20, 2020, 2:14pm

Hi i don’t why people starting automating the things the way human does. ok fine if you are able to do this task but in real world you won’t get 1 or 2 files to process you will get thousands will this solution be viable when need to process thousands files. So i recommend to you guys to go with different approach use any third party library to fetch data from pdf like itextsharp or itext7 with .net, then create a custom activity that will fetch data for you. And also you can fetch data location based also. Please let me know if you need help:-

Link- How to fetch location based Data from PDF :- itext7 - How to get the text position from the pdf page in iText 7 - Stack Overflow
Link- How to create activity in uipath:-
Using The Activity Creator

Topic		Replies	Views
PDF data extraction issue Help pdf	4	3395	January 2, 2019
Anchor Base activity issue Help	9	1187	July 23, 2019
PDF Data Extraction Issue 2 Help pdf , activities , question	16	1019	January 5, 2020
Anchor base with get text giving empty results while pdf extraction Help uiautomation	11	3796	September 20, 2019
Anchor base with get text giving empty results while pdf extraction while i was using pdf automation with anchor base activity Help uiautomation	4	820	February 6, 2020

Most Active Users - Yesterday
Anil_G
ashokkarale
Eric_Alvarado
Yoichi
Julian_Muhlbauer
SorenB
lbowen
ayumi.ouchi
lska
ppr
More details...

Reading PDF and extracting specific text using Anchor Base

Related topics