Get specific page number based on keyword from PDF files

Hi,
I want to create a robot that can detect a certain keyword in PDF file and get the page number where the keyword is detected/found. The keyword position is not fixed on one page for every PDF file, different file going to be the indifferent page for the keyword. Is it possible to do it? Can anyone please help me with this matter?

@Aoyama Can you send a sample PDF ?

Hi @shibani,
Thank you for your response. Below is the sample PDF file as per requested.

REPORT INTRA AIBOTS SDN BHD (AutoRecovered).pdf (1.7 MB)

For example, in this file, I want to search for keyword “chapter 1” and then get the page number where that keyword is detected or found. If possible, I do not want the robot to open the PDF file to search that keyword. Can you help me on this?

Hi @Aoyama,
what will be your expected location of the keyword ? i mean you want a page number, or want the exact line position?

Hi,
I want the page number where the keyword located

1 Like

Hi @Aoyama,
First you need to find the number of pages in the pdf.
You can find many activities which provides pdf page count , you just need to pass the file path only. Using any of them, get the page count.

Declare int index = 1

While index<page count
Use read PDF activity and pass range as index, now you will get a text variable containing all text from the page - txtFromPDF
Check the presence of the keyword using contains method.
ie, like if txtFromPDF.contains(“keyword”)

If true, then your keyword is found at the page number index

1 Like

Hi @shankm,
Thank you for your instruction. I will try it and will get back to you.

1 Like

Hi, in my case my pdf contains more than 200 pages. I cant extract entire pages because it takes more time. can anyone help to solve this situation?