Extract two values from multiple pdf files and store it

Here is the scenario:

There is a folder called Invoice Folder.
She contains multiple pdf for purchase invoices.
I need to fetch two values from there.
Document No. and Total values.

Inside the For Each is working at the moment, but fetch everything from pdf.

The question is:
How to take these two values and store it in excel?


Document No Total Value
Doc 1 143
Doc 2 155
Doc 3 621

Hello Veselin and welcome to the community!

Any chance that you can share those invoices?
Are they in text or picture format?

We can use PDF Activities to read the PDF and store it to a String, then we can use String manipulation and Regex to take the informations that we need.

If you cannot share us the Invoices, can you use Read PDF activity, or Read PDF With OCR activity and store those values in a Text file, and send us the copy of the part that you want to be extracted.

The files are text. The only image is logo, but we don’t need to extract the image.
I’m using test invoice document to show you the result. If i learn how to make it with this document,
I can make it in the real invoice. They are similar construct and both text.

I don’t have permission to upload txt file. So I will copy the result.

I bold the values that I need, so you can find them more easy.

DEMO - Sliced Invoices
Suite 5A-1204
123 Somewhere Street
Your City AZ 12345
Test Business
123 Somewhere St
Melbourne, VIC 3000
Invoice Number
Order Number
Invoice Date
Due Date
Total Due
$85.00 Sub Total
Total Invoice
January 25, 2016
January 31, 2016
ANZ Bank Service
Web Design
This is a sample description… Adjust
0.00% Sub Total
ACC # 1234 1234
BSB # 4321 432
Payment is due within 30 days from date of invoice. Late payment is subject to fees of 5% per month.
Thanks for choosing DEMO - Sliced Invoices
Page 1/1

Тhe result from here must be:

Invoice Number Total Due
INV-3337 93.50

Invoices.xaml (9.7 KB)

So I am going to Explain what is happening in this workflow.

I don’t have the part where i read the PDF, but I added the PDF string value with the text that you’ve send me.
I am using Regex Matches activity to get the desired values of Invoice Number and Total Due.
After I get the values (note that these patterns get the values of this example, if next Invoice is difference you would have to change it) I am creating DataTable and adding those values to a DataTable (if you have multiple Invoices, you would do this in a Loop).

After that I am using Excel Application Range to write the DataTable to an XLSX file.

Hope this helped you.

1 Like

Did this help you @Veselin_Ganchev

I mark your answer as Solution.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.