I’m working on a project to extract data from a PDF file (consists of multiple invoices - one invoice per page) to an Excel table. Solution is found for the data extraction but it only works on page 1 and not the following pages.
Is there some looping I can try for this problem and how do I instruct the robot to read the next page every time? Was thinking of assigning an incremental variable to the Range property of ReadPDF activity, but not sure how to go about it.
There seemed to have similar questions on this topic but I can’t really find any answers.
Source code is attached and any help is much appreciated, thanks!
As per my understanding you have done coding for 1st page and 2nd page robot is not reading data (?)
In pdfread activity have you selected range parameter to null?
If yes then you must go through your code and add some extra condition to check if next page is line is current line or not
Give it one try else i will provide you the solution
Can you please attach pdf file as well? - Yes, please see attachment. Sorry, due to sensitive information I have created something similar. In addition, I have also tidied up the source code.E-Invoice.xaml (18.4 KB) excel_invoice_multiple.pdf (53.5 KB)
As per my understanding you have done coding for 1st page and 2nd page robot is not reading data (?) - Yes, that is correct.
In pdfread activity have you selected range parameter to null? - Have tried setting to “Null” but result is still the same.
Please try out the PDF package for splitting the PDF into individual invoices, and they you might want to try this: How to use the IntelligentOCR Package - it uses the beta Machine Learning Extractor for data extraction.
You might also want to look into the Regex Based Extractor if all the invoices look the same .