Extracting Data from multiple pdf and different format

Hello,
I am trying to extract data from multiple pdf’s which are in a different format and write the extracted data into excel. I am facing issue while extracting the data. Kindly help me resolve my issue. I have uploaded the pdf’s for reference.101-ASR.pdf (60.5 KB) 100-Panda.pdf (42.9 KB)

1 Like

Did read pdf or read pdf ocr helped us on this
followed by using GENERATE DATATABLE activity and then writing that datatable to excel with write range
Cheers @sneha_arbole

yes, I am able to extract the data using read pdf with ocr, but it is not looping through multiple pdf’s.

1 Like

Fine store these pdfs in a folder and use this expression in assign activity
arr_filepath = Directory.GetFiles(“yourfolderpath”,“*.pdf”)

where arr_filepath is a variable of type array of string
–now use a for each activity and pass the above array variable as input and change the type argument as string in the property panel of for each activity
–inside the loop use READ PDF or READ PDF OCR and mention the file path as item.ToString and get the output with a variable of type string
–then use GENERATE DATATABLE activity and pass the string variable as input and get the output variable of type datatable
–then we can use WRITE RANGE activity and write that to a excel

Cheers @sneha_arbole

1 Like

I tried doing with the same format but all the data is getting extracted in 1 Column. I have uploaded the excel file.2.xlsx (9.3 KB)

@sneha_arbole

this looks like an invoice processing case. Have you tried the Machine Learning Extractor with the community endpoint? It might work on our case!

Have a look at this example: How to use the IntelligentOCR Package

Thanks,

Ioana

Hi @sneha_arbole,

Try installing ‘uipath.intelligentocr.activities’ from the uipath package intsaller.
There are many activities that helps you solve the problem.

You can teach the Robot where to fetch the required fields from different type of templates.

This following link helped me in doing so.

Thanks,
Bhushan