Spilt the PDF if in TAX INVOICE word conations

Actually I’m doing the invoice extraction only for Tax Invoice page
but in my invoice they are different page’s like purchase details page and order page but i want to split the tax invoice.
in my invoice tax invoice page may be in 2 page or may be in the 3 page may be in any page
if in my invoice conations TAX INVOICE word in the page split that Tax Invoice page only to the particular folder

Hey @krishna_priya ,

  1. Use get pdf page count activity to get the page count
  2. Use while loop to loop through each page using Extract PDF Page Range this will result one page as output and then use Read PDF Text that will result the string value.
  3. Use the Exists method to check if the required value exists in that output or not, if not continue the loop if it contains save that page as a pdf to specified location and break the loop.

Thanks,
Sanjit

@krishna_priya

  1. use Extract PDF Page Range activity to get the pages of the PDF declare a variable as Pages
  2. Declare a variable of Int as 0 let’s say intIndex
  3. Use a Do-While loop and write the condition as intIndex <> Pages
  4. Use Read PDF Text activity inside the Do-While Loop and in properties put Range as intIndex and output Variable as strText
  5. Use If Condition and check as strText.Contains(“conations”), Then put break, else leave

So at last the page number will be stored in intIndex variable

Hope this may help you

Thanks,
Srini

can you plz share the main file

Hi @krishna_priya

  1. Use the “Read PDF Text” activity to extract the text from your invoice document.
  2. Split the extracted text into individual pages using the “Split Text” activity. You can split the text by looking for a common pattern in the page numbers (e.g., “Page 1 of 5”, “Page 2/6”, etc.).
  3. Use a loop to iterate over each page of the invoice text.
  4. For each page, use the “String Contains” activity to check if it contains the keyword “TAX INVOICE”.
  5. If the page contains the keyword, save the page to a separate folder using the “Export PDF Page” activity. You can use the “Page Number” property to specify which page to extract.

Regards, It may help you Priya.

If you are using Document Understanding for the Data Extraction, in the classification part, use an Intelligent keyword classifier (Split one tax page manually and use that to train the Intelligent Keyword Classifier (IKC)). IKC will be able to split and classify from the document.
Or you can even use the classification bound values from the classification object and split document using pdf activities.