How to get Data from the file

Hi All,

I have one directory. It has thousands of PDF Files
These PDF Files are different Forms types. Eg Form A, Form B, Form C, Form D and Form E. We can’t know from the file name that if it is Form A or Form B or Form C and so on.
I want to extract 8 fields sets from these files. Eg:
Form Prepared By:
1.Print prepare name
3.Check box for (self prepared or not)
5.Firm’s name
6.Firm’s address
7.Firm’s EIN
8.Phone No

Form A, this information can be on Pg 4
Form B can have the above fields on pg 8
Form C can have the above fields on pg 13
Form D can have the above fields on pg 6
Form E can have the above fields on either pg 4, or either on 7 or either on page 9 or either on pg 13
How to extract the fields in excel file

Please help

Hi @Khooshbu_Jani1,

Its little bit tricky with simple automation but easy with Document understanding. If you have a knowledge of document understanding you can do easily otherwise I can send you course link. By following course you can work on this project.




It could be better to provide some sample input data …

We can use RegEx to extracts the required fields…


we can use DU (Licensed one)…

We can’t do with Community DU - It has some restrictions…


Will suggest you to use the Read PDF Activity and store it in a variable

Use some string manipulation as i mentioned earlier (RegEx) to get the data…