Need suppoty to extratc data from PDF to excel sheet

hello, need support. i have created a project to extract specific data from a PDF file to a massage box the result came as the attached photo. please advise what can i do to get result.

i am doing this for many PDF Invoices so i am using archer base activity to capture the same data for all PDF invoices in the same folder

@mohamed.saty2012

Check the get text activity ouput which you have used

The get activity was not giving proper output re check it once

Or
You need to modify the selector of the get text activity like made it dyanamic

Cheers


as you can see the archer bas activity cannot select the hole line as an archer it is only select a part of the line. also, this happens with the rest of the data that I need to extract.
this invoice is the official form of all invoices at my workplace. please any advice, this is very important for me. as I am a new at the studio.

Why don’t to you try with read pdf activity.

Try once this

1.take Read Pdf Activity and pass the invoice path in that and create a variable for that and

  1. After that you can perform the string manipulation or regex to get the expected output

@mohamed.saty2012

sorry but i don’t know to do the second activity. that’s why I chose anchor base activity to be easier for me. i am new at RPA.

Hi @mohamed.saty2012

It’s better to try with regex or string manipulations to extract data instead of rely on anchor base activity(not a best option for always).
Check this to learn regex:

You can write regex code at:

Instead of anchor base you can go with String manipulation or Regex this both are best for your process

Take read pdf text activity and pass the filepath and the output will be in string so you can share that text format then we can provide the regex pattern for that what you want to extract from that particular pdf

@mohamed.saty2012

thanks all for your support and help, i will try to study regex to understand it and use it.
i have no coding background so I will try to complete this automation with the guidelines you gave me. i hope i can do it.
thanks again all.

also, if I may could anyone tell me what the best practice programming language is I can learn if I will continue in RPA

@mohamed.saty2012

Learning VB.NET would be beneficial.


as you can see i have managed to work with regex and i managed to select the registration number from the invoice but it also contains this symbol (#) how can i remove it.
if anyone can write the code sown i would be grateful

@mohamed.saty2012

Check this:

(?<=Registration\sNumber\s.)\d+

Hi @mohamed.saty2012 ,

Try the below Expression :

(?<=Registration number\s+#).*

This is also when the Registration number always has the # at the beginning.

If not, We could perform a post processing to keep only Digits in the Extracted value.

1 Like

thanks what you send have given me an idea i have typed it and it works.
i have just added the synbole that i want to remove as following
(?<=Registration\sNumber\s#)(.*?(?=\s)) and it worked .
thanks for your support

thanks for your support and time. i am really appreciated.

1 Like


hello, how can i can choose the number at the screen shot that is highlighted.
with pattern code

Hi @mohamed.saty2012

Use this Regex Expression

(?<=Taxpayer Activity Code:\n)\d+

1 Like

Thanks a million. i really appreciate it

1 Like


hello, how can i choose the Text at the screen shot that is highlighted.
with pattern code