Extracting Data from PDF Pages

jerryl27 · March 1, 2019, 1:57am

Hi there,

I’m working on a project to extract data from a PDF file (consists of multiple invoices - one invoice per page) to an Excel table. Solution is found for the data extraction but it only works on page 1 and not the following pages.

Is there some looping I can try for this problem and how do I instruct the robot to read the next page every time? Was thinking of assigning an incremental variable to the Range property of ReadPDF activity, but not sure how to go about it.

There seemed to have similar questions on this topic but I can’t really find any answers.
Source code is attached and any help is much appreciated, thanks!

E-Invoice.xaml (23.1 KB)

rahatadi · March 1, 2019, 2:13am

Hi @jerryl27

Can you please attach pdf file as well?

As per my understanding you have done coding for 1st page and 2nd page robot is not reading data (?)
In pdfread activity have you selected range parameter to null?
If yes then you must go through your code and add some extra condition to check if next page is line is current line or not

Give it one try else i will provide you the solution

jerryl27 · March 1, 2019, 2:54am

Thanks for the reply @rahatadi

Can you please attach pdf file as well? - Yes, please see attachment. Sorry, due to sensitive information I have created something similar. In addition, I have also tidied up the source code.E-Invoice.xaml (18.4 KB)
excel_invoice_multiple.pdf (53.5 KB)

As per my understanding you have done coding for 1st page and 2nd page robot is not reading data (?) - Yes, that is correct.

In pdfread activity have you selected range parameter to null? - Have tried setting to “Null” but result is still the same.

jerryl27 · March 1, 2019, 3:18am

@rahatadi

May I know what sort of activities should I be looking at?

rahatadi · March 7, 2019, 4:53am

Hi @jerryl27

is this what you want ?
Main.xaml (39.2 KB)
excel_invoice_multiple.pdf (53.5 KB)

jerryl27 · March 13, 2019, 3:08am

@rahatadi
Thanks for the program…took a short hiatus from the project previously. Will try it out

rahatadi · March 13, 2019, 3:25pm

Let me know if it requires any changes

rahatadi · July 4, 2019, 12:26pm

@jerryl27

Hello Dear,

Have you got solution?

Please close the topic so that other can easily find correct answers easily.

Regards,
Aditya

Ioana_Gligan · September 18, 2019, 7:25am

Hello @jerryl27,

Please try out the PDF package for splitting the PDF into individual invoices, and they you might want to try this: How to use the IntelligentOCR Package - it uses the beta Machine Learning Extractor for data extraction.

You might also want to look into the Regex Based Extractor if all the invoices look the same .

Ioana

Topic		Replies	Views
Extract characters from PDF with various pages Studio studio , question , activities_panel	11	709	October 26, 2023
Getting text from PDF file with muliple pages..using looping Help pdf , activities , question	7	1659	November 25, 2019
How to get information on each page of PDF Activities pdf , studio , question	7	1278	December 10, 2020
How to read PDF document when the page has been continued to next page Studio	9	666	July 18, 2023
Reading multiple pages of a PDF Help	2	4379	May 20, 2019

Extracting Data from PDF Pages

Related topics