maybe you can help us… please see below
To implement this, you need to identify keywords that indicate the start and end of each invoice.
- Obtain the total page count of the PDF and iterate through each page using a ‘For Loop.’
- Inside the loop, extract the text of the current page using the Read PDF with OCR activity.
A. Scenario: a single-page invoice:
• Check if both the Start and End keywords are present on the page.
• If they are, extract this page as an individual invoice.
B. Scenario: a multiple-page invoice:
• Check if both the Start and End keywords are present on the page. And this time, the above condition returns False.
• Check if the Start keywords are present on the current page.
• If they are, add the page index value to a variable.
• In the next iteration, directly check for the End keywords.
• If the End keywords are not present, add the current page index to the variable.
• If the End keywords are present, add the current page index to the variable, and extract all the pages indicated by the variable as an individual invoice and reset the variable for the next iteration. Repeat the process for each new page.
Please check the attached document for detailed steps with screenshots.
Split_Pdf_Into_Multi_Invoices.pdf (426.8 KB)