Scenario pdf data extraction

Adil · October 17, 2019, 5:01pm

Hi I a pdf of same format where I have to extract the data from multiple pdf from a particular folder and save this data in excel file one by one.
Like extract the data from invoice one using OCR or get pdf data and put this data into row excel and same for 2nd and so on.

please help.

Dave · October 17, 2019, 5:25pm

Hi @Adil this is very doable, but I don’t understand your question? What have you tried so far? What problems are you running into? We are all happy to help at this forum, but the point of the forum is not to build all of your workflows for you from scratch

Adil · October 17, 2019, 5:50pm

Hi @Dave

I am opened all pdf files from a folder but i am unable to specific screen scraping from all pdf files. i have done but it is running for only one. Like in a folder there are 5 resumes i need to extract name from those resumes.

Dave · October 17, 2019, 5:55pm

I would start by getting an array of filenames. This is an array of strings that contains the filepath for each of the pdf files you want to read. This can be done with Directory.GetFiles(“yourpathhere”,“*.pdf”). Store this as a string array variable.

Now use a for each activity, change the TypeArgument to string, and give it the string array variable you just created with all of the filepaths.

Within the for each loop, use the read pdf with OCR activity and save the output as a string variable. Use string manipulation (regex, split, index, etc) to pull out the name and save it as another string variable. Now you can put the string variable into a datatable, list, or directly into your application at this time.

After doing the above steps your robot will have obtained the name from all resumes in the folder listed in the original directory.getfiles() portion.

Adil · October 18, 2019, 3:18am

thanks @Dave,

we can pass only one specified file within read pdf with OCR activity. It can not read all files within a folder.

Can you please help me , I am not able to read all pdf files within a folder.

Thanks

Dave · October 18, 2019, 1:35pm

@adil - yes, my instructions above will open each file in the folder one at a time

Adil · October 24, 2019, 2:29am

Hi @Dave , If we have 100s of pdf file in a folder then we will have to write read pdf with ocr activity for each?

Dave · October 24, 2019, 3:02pm

No, like I stated before it will go through all of the pdf files in your folder one at a time until they are all completely finished. You program it once, then the robot will do all of them

Topic		Replies	Views
Read multiple PDF from a Folder Help pdf , activities , question , file_system	24	3230	January 10, 2020
Extracting Data from multiple pdf and different format Help excel , pdf , activities , question	6	4069	May 20, 2020
How to read multiple invoices in pdf Help activities , question	16	3322	February 28, 2021
How to extract data from multiple pdf in a folder and save into Excel Help	1	1077	March 26, 2019
Extracting data through pdf using ocr and store it into excel Help studio	6	1936	November 20, 2019

Scenario pdf data extraction

Related topics