Spilt the PDF if in TAX INVOICE word conations

krishna_priya · May 10, 2023, 11:37am

Actually I’m doing the invoice extraction only for Tax Invoice page
but in my invoice they are different page’s like purchase details page and order page but i want to split the tax invoice.
in my invoice tax invoice page may be in 2 page or may be in the 3 page may be in any page
if in my invoice conations TAX INVOICE word in the page split that Tax Invoice page only to the particular folder

Sanjit_Pal · May 10, 2023, 11:48am

Hey @krishna_priya ,

Use get pdf page count activity to get the page count
Use while loop to loop through each page using Extract PDF Page Range this will result one page as output and then use Read PDF Text that will result the string value.
Use the Exists method to check if the required value exists in that output or not, if not continue the loop if it contains save that page as a pdf to specified location and break the loop.

Thanks,
Sanjit

Srini84 · May 10, 2023, 11:50am

@krishna_priya

use Extract PDF Page Range activity to get the pages of the PDF declare a variable as Pages
Declare a variable of Int as 0 let’s say intIndex
Use a Do-While loop and write the condition as intIndex <> Pages
Use Read PDF Text activity inside the Do-While Loop and in properties put Range as intIndex and output Variable as strText
Use If Condition and check as strText.Contains(“conations”), Then put break, else leave

So at last the page number will be stored in intIndex variable

Hope this may help you

Thanks,
Srini

krishna_priya · May 10, 2023, 12:09pm

can you plz share the main file

mkankatala · May 10, 2023, 12:20pm

Hi @krishna_priya

Use the “Read PDF Text” activity to extract the text from your invoice document.
Split the extracted text into individual pages using the “Split Text” activity. You can split the text by looking for a common pattern in the page numbers (e.g., “Page 1 of 5”, “Page 2/6”, etc.).
Use a loop to iterate over each page of the invoice text.
For each page, use the “String Contains” activity to check if it contains the keyword “TAX INVOICE”.
If the page contains the keyword, save the page to a separate folder using the “Export PDF Page” activity. You can use the “Page Number” property to specify which page to extract.

Regards, It may help you Priya.

Anas-p-v · May 11, 2023, 5:26am

If you are using Document Understanding for the Data Extraction, in the classification part, use an Intelligent keyword classifier (Split one tax page manually and use that to train the Intelligent Keyword Classifier (IKC)). IKC will be able to split and classify from the document.
Or you can even use the classification bound values from the classification object and split document using pdf activities.

Topic		Replies	Views
Separate single PDF Invoice file to multiple individual files Help pdf , activities , question	5	2928	September 18, 2023
PDF Extraction -Multiple Invoices in single file Studio studio , question	1	441	November 7, 2023
Split pdf based on matching data Activities pdf , activities , question	4	895	March 11, 2023
PDF automation - How to split PDF based on content using UiPath Help pdf	22	5303	January 22, 2025
How split pdf file into many files based on specific text? Studio studio , question , activities_panel	1	798	January 5, 2023

Spilt the PDF if in TAX INVOICE word conations

Related topics