I am trying to find which pages a certain string of text appears in a group of PDFs. So far my process is:
Get PDF files from directory
For each loop through each PDFs and use Read PDF Text to write the PDF text to a variable
Identify if a string of text or formatting of text occurs in the PDF (I think this can be done via RegEx or other string functions)
Identify which pages of the PDF match the text criteria
So far, I have no trouble reading the PDF to text and I have found a method to get the total number of pages in a PDF but cannot figure out how to identify which pages this text occurs on.
Can anyone help me identify where in the PDF my text occurs? The PDFs will have varying page lengths and the pages containing the text will appear on varying pages.
You can try it as below.
Assign int i=1
Assign strPageNumbers
Inside your main for each of pdf files Use while activity and use condition i<= pdfPageCount.
Inside while use Read PDF Text activity and set page range to i.ToString.
Now add your condition to check your particular string in that page. (Check it in your pdf text output variable)
if page contains your string then add that particular page number to the pdfPageCount= pdfPageCount+i.ToString+“,”. (“,” is to separate each page number
At the end but inside while increment i=i+1
Do the same for each pdf inside your main PDF files for each loop.
I am unable to share the pdf because it contains sensitive information. I am fine with reading the pdf because the pages I need to find are text so they will always show up in the Read PDF Text output if present in the file.
My issue is with finding the page number of the PDF the text occurs on when I find the text in the Read PDF Text output.
It is String variable and used to assign
strPageNubers= strPageNubers+i.ToString+“,”. (“,” is to separate each page number
But in above post, I mentioned it as pdfPageCount by mistake.