Extract specific text from pdf to excel

I was trying to extract specific text from pdf to excel. could someone help me how to do that?
I have a pdf file with invoice num, invoice date, due date. I want to extract these three things from the pdf document and put it into excel.

Hi @venky

–read the pdf first and then use regx to extract the required data and then prepare data table and then insert the extracted data into data table and finally use write range activity to write into excel

If possible share pdf so that we can suggest regx to get the nvoice num, invoice date, due date

Hi @venky
I used Anchor Base for Find Element, but this is not work in new Adobe Reader DC.
I needed to compare the data with Excel with the data in pdf.

Maybe it will help you :slight_smile:

1 Like

@venky Is it possible to share screenshot or pdf

we can use computer vision activity to access these elements individually and get them as a string…kindly have a look at this buddy

Cheer @Sajuri

1 Like

This is the screenshot of my pdf.

@indra, @kalyanDev The above screenshot is the pdf file which I am working.

1 Like

Did that activity works on your scenario buddy @venky

@venky You can use regular expression to get the data from the pdf
to get invoice number regex is here
to get invoice date regex is here

1 Like
Dim match as Match = Regex.Match(<pdf data>,"Invoice [#\d]{6} ",RegexOptions.IgnoreCase)
Invoice = match.Value
match =Regex.Match(<pdf data>,"Invoice date: [\d]{2}[\w]{2}[\s][\w]{3}[\s][\d]{4}",RegexOptions.IgnoreCase)
invoice date = match.value
match =Regex.Match(<pdf data>,"due date: [\d]{2}[\w]{2}[\s][\w]{3}[\s][\d]{4}",RegexOptions.IgnoreCase)
duedate = match.value

Thank you buddy, but I just wanted to help him :smiley: My problem with pdf was solved in UI tutorials :slight_smile:
But thank you for your help @Palaniyappan

1 Like

Fine
Cheers @Sajuri

1 Like