Multiple PDF data extraction and store into Excel using String operations

Hey all, @ashley11 @lakshman

I want to extract data from multiple PDF files and store into excel sheet using Regular expression and string operations. I had received this workflow Main.xaml (12.6 KB) from @ashley11 . But, if the PDF filepath is changed, it’s not working. Now, I just want to execute it for multiple PDF PDF Samples.zip (815.5 KB) stored in a folder and extract the data to an Excel sheet.Out_Sample.xlsx (7.9 KB)

Thanks in advance

using For Each Activity loop the pdf_files in the folder
image

Directory.Getfiles(“FOLDER_PATH”)
it will help you to process each files in there folder one by one.

Regards,
Vishnu N

@NiranjanKN

Are you reading PDF data using Read PDF Text Activity or directlying indicating elements on the PDF page and reading it ?

@lakshman I’m reading the PDF using Read PDF Text activity only.

@NiranjanKN

Directory.GetFiles(“FolderPath”,"*.pdf")

Above expression will read all PDF files from that folder and will give output as array of string. And then use For Each loop Activity to itreate one by one PDF file.

Hey, @lakshman @VISHNU07
Tell me what’s wrong in this code Main.xaml (15.4 KB) as I’ve done as told and I’m getting this error Capture

You are getting this Exception because of Regex for InvoiceNo. returns nothing.

Can you please edit the .xaml file and send me @srdjan.suc

IDK, I can try :slight_smile:

Any updates as to what’s wrong in this code Main.xaml (16.1 KB) guys. @lakshman @VISHNU07 @ashley11 @KarthikByggari

@NiranjanKN

Could you please show me screenshot of that Assign Activity once and will check it.

This error usually occurs when you are trying to assign Null value.

Sure @lakshman. These are the assign statements used in the code.
Capture

@NiranjanKN

Could you please run the workflow in Debug mode and check exactly where it is failing.

And also put one Write Line or Message Box Activity after each Assign and try to print the value. By using these we can easily identify where is the issue.

@lakshman
I tried with debug mode and got the error in the Assign: Invoice_No_Str activity.
This is the code for your reference. Main.xaml (16.1 KB)

@NiranjanKN

Change variable Invoice_No_Str type from Inumerable to String and then try once.

@lakshman The Invoice_No_Str is already a String

@NiranjanKN

I checked the workflow you attached here and found it is of type Inumerable but not String.

@lakshman

The problem is that Regex expression returns nothing, hence the error.
Look at this example (the text bellow is the text we get when we use Read PDF Activity):

Problem is this :

2nd Floor Invoice No.

21289A

It is in Next Line, so this Regex won’t fetch this informations.

Same thing apply to regex for Company_Name

1 Like

Need to fine tune this and give it a try.Multiple_PDF_Data_Extration.xaml (19.1 KB) for these PDF extraction PDF Samples.zip (815.5 KB)

in my pdf there is number of Time is available …
I am extract that in excel but this Time is extracted in single cell…
I want Extact that is row ???