Page number from pdf

How to get the pdf page number searching for a particular word or sentance?

Example: “Detailed audit findings” word will comes in page 3 ,I want to extract the page number ie.3

Hi @yashashwini2322 ,

Could you check the below Post. It was a similar requirement, we could modify the approach according to your needs :

Hello! I dealt with the same issue, and the easiest way for me was to create a PDF Page Count that retrieves the total number of pages of the PDF and then create a Counter that while you iterate trough the PDF it increases by one.

Hope it helps! :slight_smile:

Can you elobarte more.

Sure! So you have an activity called “Get PDF Page Count” where you’ll retrieve the number of pages that your PDF has. Then you’ll create a variable called “PdfPageCounter” and Assign it to “PdfPageCounter + 1” and make sure to put all of this in a loop like a "Do While or something like that.

In my case I have a “For Each File in Folder” then the “Get PDF Page Count” after that a “Do While” and inside this activity I have the counter that increase throughout the iterations.

Hope I helped! Let me know if you need anything else. :slight_smile:

Hi @yashashwini2322.

1.Intialize the default value as 1 to a variable.
2.Take the pdf page count and store it in a variable
3.Then use the while loop and in that please mention the condition as variable <=
pdfpagecountvariable
4.Use the read pdf text and read pdf text for scanned pdf’s and pass the variable into the range
of the property panel.
5.Then place a if condition if there is any criteria like a specific page should be read and if there
is no such condition then you can skip this step.
6.Take the matches activity and pass the regex expression into that.
7.Use the for each and pass the output of the matches activity.
8.Use the write cell to write the extracted data into the excel.
9.At last within the while loop use an assign activity and increase the value of the variable.

This will be the sample workflow.

Hope it works

Hi @yashashwini2322.

1.Intialize the default value as 1 to a variable.
2.Take the pdf page count and store it in a variable
3.Then use the while loop and in that please mention the condition as variable <=
pdfpagecountvariable
4.Use the read pdf text and read pdf text for scanned pdf’s and pass the variable into the range
of the property panel.
5.Then place a if condition if there is any criteria like a specific page should be read and if there
is no such condition then you can skip this step.
6.Take the matches activity and pass the regex expression into that.
7.Use the for each and pass the output of the matches activity.
8.Use the write cell to write the extracted data into the excel.
9.At last within the while loop use an assign activity and increase the value of the variable.

Please see the attached flow

Hope it works