Extracting Data from PDF Pages

Hi there,

I’m working on a project to extract data from a PDF file (consists of multiple invoices - one invoice per page) to an Excel table. Solution is found for the data extraction but it only works on page 1 and not the following pages.

Is there some looping I can try for this problem and how do I instruct the robot to read the next page every time? Was thinking of assigning an incremental variable to the Range property of ReadPDF activity, but not sure how to go about it.

There seemed to have similar questions on this topic but I can’t really find any answers.
Source code is attached and any help is much appreciated, thanks!

E-Invoice.xaml (23.1 KB)

Hi @jerryl27

Can you please attach pdf file as well?

As per my understanding you have done coding for 1st page and 2nd page robot is not reading data (?)
In pdfread activity have you selected range parameter to null?
If yes then you must go through your code and add some extra condition to check if next page is line is current line or not

Give it one try else i will provide you the solution

Thanks for the reply @rahatadi

Can you please attach pdf file as well? - Yes, please see attachment. Sorry, due to sensitive information I have created something similar. In addition, I have also tidied up the source code.E-Invoice.xaml (18.4 KB)
excel_invoice_multiple.pdf (53.5 KB)

As per my understanding you have done coding for 1st page and 2nd page robot is not reading data (?) - Yes, that is correct.

In pdfread activity have you selected range parameter to null? - Have tried setting to “Null” but result is still the same.

@rahatadi

May I know what sort of activities should I be looking at?

Hi @jerryl27

is this what you want ?
Main.xaml (39.2 KB)
excel_invoice_multiple.pdf (53.5 KB)

@rahatadi
Thanks for the program…took a short hiatus from the project previously. Will try it out :slight_smile:

Let me know if it requires any changes

@jerryl27

Hello Dear,

Have you got solution?

Please close the topic so that other can easily find correct answers easily.

Regards,
Aditya

Hello @jerryl27,

Please try out the PDF package for splitting the PDF into individual invoices, and they you might want to try this: How to use the IntelligentOCR Package - it uses the beta Machine Learning Extractor for data extraction.

You might also want to look into the Regex Based Extractor if all the invoices look the same .

Ioana