Separate single PDF Invoice file to multiple individual files

I have a single PDF file which contains multiple invoices. I would need to split the files and rename each file based on invoice number.

My problem is some files have multiple pages. Since the word “Total due” is indicated on the last page, how can I create a sequence wherein the PDF file will be split based on the last text.

Attaching the PDF as a sample.document.pdf (29.5 KB)

2 Likes

I need help with this as wwll.

hi @Joshua_Salazar @Lashan_Bannister_BS

Please find something to start with here…

In this post, i have provided solution split the pdf for the matching word(Invoice)and later remove that page join others.

In your case, you can ignore the join.

Without having any clue on the page 4 splitting that would be tough. I am looking for invoice # to be printed on the both the pages if its spans to multiple pages, that’s one of the basic thing we do while we building invoices.

I am also thinking for a solution, in the meanwhile please take a look at the attached xaml and explore your options.

Hi @Joshua_Salazar @Lashan_Bannister_BS , you guyz might have find the solution by this time…Recently I solved this problem for one of a forum member…

I am building the string based on the Match found…In this case, instead of Total Due I took “Invoice” which is found is page 1, 2 and 4…so we need to split into 3 pdfs. So I am building a string as shown below.

image

Finally passing this string to “PDF Splitter” (Bala Reva activities) which splits the files and put it in the output folder…

image

Hope this helps…

sir kindly send me source code

Hi Everyone,

To implement this, you need to identify keywords that indicate the start and end of each invoice.

  1. Obtain the total page count of the PDF and iterate through each page using a ‘For Loop.’
  2. Inside the loop, extract the text of the current page using the Read PDF with OCR activity.

A. Scenario: a single-page invoice:
• Check if both the Start and End keywords are present on the page.
• If they are, extract this page as an individual invoice.

B. Scenario: a multiple-page invoice:
• Check if both the Start and End keywords are present on the page. And this time, the above condition returns False.
• Check if the Start keywords are present on the current page.
• If they are, add the page index value to a variable.
• In the next iteration, directly check for the End keywords.
• If the End keywords are not present, add the current page index to the variable.
• If the End keywords are present, add the current page index to the variable, and extract all the pages indicated by the variable as an individual invoice and reset the variable for the next iteration. Repeat the process for each new page.

Please check the attached document for detailed steps with screenshots.
Split_Pdf_Into_Multi_Invoices.pdf (426.8 KB)