Read multiple PDF from a Folder

Hey Hi Team,

Its sorted now, :slight_smile:
only issue is how can I save the data of multiple PDF file which is extracting the specific data in different excel sheet with the same name as PDF is having in that particular folder?
@lakshman
@Palaniyappan

Regards,
Rahul

1 Like

Hi @Rahulsinha ,

Is the Regular Expression which needs to be extracted will be in dynamic position or present in a static position?

Regards,
Sasidhar

Hi @Rahulsinha

Use for each activity and add this code Directory.GetFiles(<folderpath>,"*.pdf") it will get all the files from that folder

1 Like

Hi @Rahulsinha
Try this workflow. The pdfs are being read from the ‘PDF’ folder. Replace this path with your PDF folder path. While saving the excel file type the ‘NewExcelFile’ value
BlankProcess.zip (1.1 MB)

Fine
hope these steps would help you resolve this
–use a ASSIGN activity and mention like this
arr_Files = Directory.GetFiles(“yourfolderpath”,“*.pdf”)
where arr_files is a variable of type array of string
–now pass this variable as input to FOR EACH activity and change the type argument as string in the property panel
–inside the loop use a READ PDF or READ PDF with OCR and get the output with variable of type string or even with another method to get that particular value is fine finally storing the value in a variable of type string

–then use a WRITE CELL activity to mention the specific value obtained from pdf to the excel or use ADD DATA ROW activity if we have other values to be mentioned in the datatable by passing that value to ARRAY ROW variable like this
{“value1”,“value2”,…,“valuen”}

then while mentioning the sheet name in write range or write cell activity mention like this
Path.GetFileNameWithoutExtension(item.tostring).ToString

Cheers @Rahulsinha

Hi Palani,

I have created already the Workflow but I have issue on:
only issue is how can I save the data of multiple PDF file which is extracting the specific data in different excel sheet with the same name as PDF is having in that particular folder?

image
image

As in the folder there will be different PDF files,it should read one and create the excel file with the same name as it is mention for PDF file and again it repeats the same for other existing PDF file

Yah this would get on your requirements buddy
But in addition to above we can include BUILD DATATABLE activity at the very beginning of the workflow that is before the ASSIGN activity with array of string

Cheers @Rahulsinha

This is how I have started my WF buddy.

image

Kindly let me know what i need to do here and also what I need to mention is the built data table as in the body I have updated it with string.
How it can be helpful to create different Excel sheet based on multiple PDF. :slight_smile:

@Palaniyappan

1 Like

Yah please go ahead
@Rahulsinha

Actually I asked a doubt on my last update:

Do you mean I should add Build data table before ForEach activity. If yes then what input I need to give and how it will be creating different excel sheet based on the PDF.

:slight_smile:

Fine
—first Build datatable activity and get the output as dt
—then For each loop with that expression as input
—inside the loop use READ PDF and get the output and string manipulation to get that value
—now use add Datarow with that value and datatable as dt
—then in Append range with sheet name as
Path.GetFileNameWithoutExtension(item.tostring).ToString
—finally inside the loop use clear datatable activity with input as dt
This will delete the last appended data and for next loop fresh datatable will be sent for filling data

Cheers @Rahulsinha

Thanks buddy. I tried with it and I am just able to read and write one PDF file but when it is reading the second PDF file from the folder and trying to write it in excel I am getting the below error. :frowning:
image
image

While writing for the first PDF there is no error, the above is for the second file when it is reading and there are 5 more PDF file which need to be read and write in excel

Use a kill process activity next to the excel application scope and mention the process name as “EXCEL”

Cheers @Rahulsinha

Already tried buddy.

image

after that also getting the error
@Palaniyappan

1 Like

Kindly place that above the excel application scope
And if possible can I have a view on the range mentioned in append range

Cheers @Rahulsinha

Sure,

It is reading two PDF file and writing the data in excel but for the third time to read the third PDF and write it in excel, i am getting the error.


image

1 Like

Got it
If we use append range we must have that mentioned sheet already
Else
If it’s a new sheet then use WRITE RANGE activity and mention that sheetname and datatable as input and also enable add headers property

Cheers @Rahulsinha

nopes :frowning:

I am not able to understand like in append range it is creating Test.xlsx and also reading PDF and wring it in excel with the Sheet name same as PDF File name and it is reading for two file but while repeating third time I am getting the error as Append Range: Range does not exist.

Can you please guide me on how to solve that query so it can read all the PDF files and write in the same way in an excel. using append range

1 Like

As per the image it seems like this input is not a datatable variable is it so
@Rahulsinha

No, it is not the datatable varisble is Finaldt, but with that also I am unable to run the query, I am trying to real all the PDF file existing in the folder and write it in one excel sheet with different sheet name as [er the pdf file name.

As of now with the help of append range I am able to read PDF and write it in Excel (Test.xlsx) with two different sheet name but for the third time I am getting an error :frowning: