Pdf automation solution

I have a pdf documents that contains 50 pages.

Let’s assume that I search a word 9ac in that pdf document. Once I found that, then i want pages from begining to till that word searched pages.

I need a solution for this, if anyone knows kindly let us know the steps by steps solution to complete the task.

Thanks in advance.

Hi @Iswarya_P1

Please share the input PDF so that step by step solution can be built.
Regards,

Hi @Iswarya_P1,

Please find the attached workflow that will extract the pages from the beginning of a PDF document until the page where a specific word is found.

Summary of steps:

  1. Get total PDF Page Count
  2. Perform a While loop to traverse the PDF pages from beginning as long as PDF Page Count > 0.
  3. Within While loop, read current PDF page as a text
  4. Use Is Text Matching activity to check if the specific keyword is present in the current pdf page.
  5. Once the keyword is found use Extract PDF Page Range activity to generate a new PDF from beginning till the page keyword is found and,
  6. Break the loop.

For proof of concept, I utilized a SamplePDF consisting of 5 pages stored in the Data folder of the bot.
The specific word to be searched is located on page 3.
The attached workflow performs a search for the given word on each page of the PDF, starting from the beginning until the first match is found.
Once the initial match is identified, the bot generates a new PDF in the Output folder. This new PDF includes pages starting from the beginning up to the page where the text was matched.

Note: I used the “UiPath.PDF.Activities” package for PDF processing, that you may install from “Manage Packages” under the Studio “Home” tab. I have also added annotations in workflow for clarity of steps.

Please let me know if the proposed solution works for you, cheers!

ExtarctPdfPages_BeginningToEndOfKeywordPage.zip (65.9 KB)

@Iswarya_P1

Follow the steps

Initilaize a variable of type integer endcount

  1. Read pdfcount into variable
  2. While loop with condition as true and max iteration count as pdfcount and declare a variable for the index property in while loop
  3. Then inside loop use read pdf with specific pge and give range as indexvar.tostring and read data into str
  4. If condition with str.Contains("9ac")
  5. On then side use assign and save the indexvar value to a variable endcount and then break
  6. Outside loop use if condition endcount=0 on then side not found
  7. On else side use extract pdf range activity and give range as "1-" + (endcount+1).ToString

Hope this helps

Cheers

What should I mentioned in the while conditions? Can you guide me.?@Anil G

@Iswarya_P1

From 3 to 5 steps will go into while loop condition will be True and Max uteration will be pdfCount

cheers