PDF extraction using Regex exp

Hi Community,

I have a pdf document image below.

Trying to extract fields
OMKUMAR TOLANI, DOC.NO, Policy / Cert. No. ,Being policy premium, VALUE ADDED TAX 5%, TOTAL

Please help me with this.

Thanks,
Praveena.

@praveena.nadi

Try sharing the text file so that it would be easy for writing Regex expressions.

Regards,

Hi @praveena.nadi
Can you Provide pdf if possible

Thanks

@Parvathy

Here is the text file for your reference, thanks.

Hello Praveena,

–Install PDF Activities

  1. Use Read PDF activity
  2. Use Regex to extract the info

Hi @praveena.nadi

Try using the below expressions

Omkumar Tolani ------------ (?<=\n)[A-Z]+\s*[A-Z]+(?=\s*[A-Z]+.[A-Z]+\s*: )

DOC.No ----------------- (?<=[A-Z]+.?[A-Z]+\s*:\s*)[A-Z]+\d+-?\d+(?=\s*[A-Z]+[a-z]+\s*: )

Policy/Cert No ----------------------- (?<=[A-Z]+[a-z]+.\s*:\s*)[A-Z]+-\d+-?\d+-?\d+-?\d+(?=\s*[A-Z]+\s*[A-Z]+)

Being Policy Premium ----------------------------- (?<=[A-Z]+-\s*)\d+,?\d+.?\d+(?=\s)

Tax -------------------------- (?<=TAX\s*)\d+%?(?=\s*\d+.?\d+)

Total ----------------------------- (?<=[A-Z]+\s*)\d+,\s*\d+.?\d+()

Hope all the works!!

Regards

1 Like

@vrdabberu

Thank you very much, I will try above and get back to you.

@praveena.nadi

If you find the correct solution for your query please, mark it as solution to close the loop

Regards

Hi,

except first one, nothing is working out.
please check once,
thanks

@Parvathy @pravallikapaluri

I have added pdf text file

can you try making regex expressions and share it with me

thanks

Hi @praveena.nadi

Please remove the ignore case in the properties panel for matches activity and give a try. I am able to extract all values.

@praveena.nadi

Try the above regex expressions in either UiPath Studio or else RegExr website.

@praveena.nadi

Did it worked?

@vrdabberu

Yes all expressions are working well with UiPath studio.

thank you.

@praveena.nadi

Thank You

@vrdabberu

Hi, I have other 2 pdf files where I have to extract same fields but these regex expressions are not working with them.
I have attached pdf text files for other two as well, Please check

Thank you

@praveena.nadi

All the expreesions are working fine except 3 so I have given the new regex expression for those

Name ------------------------

(?<=[A-Z]+\s[A-Z]+[a-z]*\s?[A-Z]+[a-z]*\.?\s*[A-Z]+[a-z]*\-?\s+)\d+,?\d+.?\d+(?=\s)

Being Policy Premium -------------------------

(?<=TAX INVOICE\s+).*(?=\s+DOC)

DOC.No ---------------------------------------

(?<=[A-Z]+.?[A-Z]+\s*:\s*)[A-Z]+\d+-?\d+(?=\s+[A-Z]+[a-z]*\s*[A-Z]*[a-z]*\s*: )

Hope it works !!

@vrdabberu

Ill check and let you know
Thank you

@vrdabberu

You are awesome
its working for all the 3pdf files, will let you know If i get more files to be tested.
thank you very much for your help.

Regards,
praveena.

@praveena.nadi

Thank You

Happy Automation!!

1 Like