Scenario pdf data extraction

Hi I a pdf of same format where I have to extract the data from multiple pdf from a particular folder and save this data in excel file one by one.
Like extract the data from invoice one using OCR or get pdf data and put this data into row excel and same for 2nd and so on.

please help.

Hi @Adil this is very doable, but I don’t understand your question? What have you tried so far? What problems are you running into? We are all happy to help at this forum, but the point of the forum is not to build all of your workflows for you from scratch :slight_smile:

Hi @Dave

I am opened all pdf files from a folder but i am unable to specific screen scraping from all pdf files. i have done but it is running for only one. Like in a folder there are 5 resumes i need to extract name from those resumes.

I would start by getting an array of filenames. This is an array of strings that contains the filepath for each of the pdf files you want to read. This can be done with Directory.GetFiles(“yourpathhere”,“*.pdf”). Store this as a string array variable.

Now use a for each activity, change the TypeArgument to string, and give it the string array variable you just created with all of the filepaths.

Within the for each loop, use the read pdf with OCR activity and save the output as a string variable. Use string manipulation (regex, split, index, etc) to pull out the name and save it as another string variable. Now you can put the string variable into a datatable, list, or directly into your application at this time.

After doing the above steps your robot will have obtained the name from all resumes in the folder listed in the original directory.getfiles() portion.

1 Like

thanks @Dave,

we can pass only one specified file within read pdf with OCR activity. It can not read all files within a folder.

Can you please help me , I am not able to read all pdf files within a folder.


@adil - yes, my instructions above will open each file in the folder one at a time

Hi @Dave , If we have 100s of pdf file in a folder then we will have to write read pdf with ocr activity for each?

No, like I stated before it will go through all of the pdf files in your folder one at a time until they are all completely finished. You program it once, then the robot will do all of them