Reading PDF and extracting specific text using Anchor Base

Hi team,

I am trying to use the Anchor Base activity to Extract specific data from PDF and store it in a variable, but this does not work as the Output value is always null.

Can someone please have a look at this Project and help me fix this, please?PDF_data extraction _to_Excel.zip (291.2 KB)

1 Like

Hi
Is pdf is opened in foreground to fetch the data we want with Anchor base activity
For that we need to use Start process activity
Cheers @mc00476004

1 Like

Hi,
Yes,Start Process activity has been used and all the actions mentioned in the sequence are being performed. Please see attached my project file PDF_data extraction _to_Excel.zip (291.2 KB)

1 Like

Fine
i saw your workflow
may i know why we do these send hot keys
we can use SEND HOT KEY activity with down key or pgdn key and then get the word we want with ANCHOR BASE activty

Cheers @mc00476004

1 Like

I used the send hot keys to set the page size to 100% and to start from page 1 by pgup and set the reading mode in Adobe to tagged or infer from doc mode which all works fine.

The only issue is that the anchor base activitiy is not working or unable to get the output.

In Find Element activity, remove the idx from the selectors.
Choose some other type selectors like parentid, aaname, tag, innertext.

1 Like

Hi Manish, i tried doing that by keeping the selector very simple removing idx and it also got validated, but still no output. I am using Adobe reader xi, is that causing any problems. Is this working in on your machine?

Hi
instead of get text activity use HIGHLIGHT activity and lets check whether that term is getting highlighted or not
or
did we try with SCREEN SCRAPPING method

@mc00476004

1 Like

Hi, i used highlight and strangely the text that i wanted to be highlighted is not working, instead an empty blank on the PDF is getting highlighted. That’s why i see blank string in the output. I am using Adobe Acrobat reader DC now, not sure why it’s behaving this way.

Have not tried screenscrapping yet, will try and let you know.

Thanks

1 Like

Thank you ! I used full text in screenscrapping and used regex and string manipulation as the specific data is not working for me either using anchor base or screen scraping.

I have a question related to screen scraping using full text, when i use this at one go i am able to extract data only from one page, is there any way we can make sure that all the pages are extracted. I don’t want to use read pdf activity as it is not recognising the table structure and new lines and make it difficult to use reg ex.

Hi i don’t why people starting automating the things the way human does. ok fine if you are able to do this task but in real world you won’t get 1 or 2 files to process you will get thousands will this solution be viable when need to process thousands files. So i recommend to you guys to go with different approach use any third party library to fetch data from pdf like itextsharp or itext7 with .net, then create a custom activity that will fetch data for you. And also you can fetch data location based also. Please let me know if you need help:-

Link- How to fetch location based Data from PDF :- https://stackoverflow.com/questions/43746884/how-to-get-the-text-position-from-the-pdf-page-in-itext-7
Link- How to create activity in uipath:-
https://docs.uipath.com/integrations/docs/how-to-create-activities

2 Likes