Read PDF and find page number where text occurs

I am trying to find which pages a certain string of text appears in a group of PDFs. So far my process is:

  1. Get PDF files from directory
  2. For each loop through each PDFs and use Read PDF Text to write the PDF text to a variable
  3. Identify if a string of text or formatting of text occurs in the PDF (I think this can be done via RegEx or other string functions)
  4. Identify which pages of the PDF match the text criteria

So far, I have no trouble reading the PDF to text and I have found a method to get the total number of pages in a PDF but cannot figure out how to identify which pages this text occurs on.

Can anyone help me identify where in the PDF my text occurs? The PDFs will have varying page lengths and the pages containing the text will appear on varying pages.

Hi @AndrewRoda

try this

or use start process
and then use ctrl+f and do the search operations

Thanks
Ashwin S

Hi @AndrewRoda

Can you share your pdf with the text you want to search so that i can help you with the solution?

Hi @AndrewRoda,
Welcome to the UiPath forum!

You can try it as below.
Assign int i=1
Assign strPageNumbers

Inside your main for each of pdf files Use while activity and use condition i<= pdfPageCount.
Inside while use Read PDF Text activity and set page range to i.ToString.
Now add your condition to check your particular string in that page. (Check it in your pdf text output variable)
if page contains your string then add that particular page number to the pdfPageCount= pdfPageCount+i.ToString+“,”. (“,” is to separate each page number
At the end but inside while increment i=i+1

Do the same for each pdf inside your main PDF files for each loop.

Hope this will work for you.
Thanks!

Thanks, Deepak! This is the closest to what I envision my solution as. I will give it a try and reply here if I am successful.

I am unable to share the pdf because it contains sensitive information. I am fine with reading the pdf because the pages I need to find are text so they will always show up in the Read PDF Text output if present in the file.

My issue is with finding the page number of the PDF the text occurs on when I find the text in the Read PDF Text output.

@Deepak94 what do you Assign strPageNumbers to/as?

Hi @gregelliott,

It is String variable and used to assign
strPageNubers= strPageNubers+i.ToString+“,”. (“,” is to separate each page number
But in above post, I mentioned it as pdfPageCount by mistake.

Thanks!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.