Hi all,
I am stuck at one point here -
I got a pdf files with 10 or more invoices, some are 1 page, some are 2 pages and some are 3 pages or more. I would like to extract the invoices out one by one (i.e. extract the pdf pages with the same invoice number). however with the difference in page number in each case, I am stuck at how should I create the program. Can anyone give me some insight? Thanks!
use filepath=directory.getfiles(filepath,*pdf)
for each file use read pdftext with range set to “all” in this case you dont need to worry about how many pages it have
use matches activity to extract the pdf invoice number with advanced regex
in this way you can get the invoice number of each file
you can create a data table with file path and invoice number and add all the invoice number and file paths in the for loop
later you can check with the invoice numbers in the datatable and join the pdfs with the same invoice number using join pdf activity