Get specific page number based on keyword from PDF files

Aoyama · January 9, 2020, 3:13am

Hi,
I want to create a robot that can detect a certain keyword in PDF file and get the page number where the keyword is detected/found. The keyword position is not fixed on one page for every PDF file, different file going to be the indifferent page for the keyword. Is it possible to do it? Can anyone please help me with this matter?

shibani · January 9, 2020, 4:46am

@Aoyama Can you send a sample PDF ?

Aoyama · January 9, 2020, 6:33am

Hi @shibani,
Thank you for your response. Below is the sample PDF file as per requested.

REPORT INTRA AIBOTS SDN BHD (AutoRecovered).pdf (1.7 MB)

For example, in this file, I want to search for keyword “chapter 1” and then get the page number where that keyword is detected or found. If possible, I do not want the robot to open the PDF file to search that keyword. Can you help me on this?

shankm · January 9, 2020, 6:44am

Hi @Aoyama,
what will be your expected location of the keyword ? i mean you want a page number, or want the exact line position?

Aoyama · January 9, 2020, 6:58am

Hi,
I want the page number where the keyword located

shankm · January 9, 2020, 7:47am

Hi @Aoyama,
First you need to find the number of pages in the pdf.
You can find many activities which provides pdf page count , you just need to pass the file path only. Using any of them, get the page count.

Declare int index = 1

While index<page count
Use read PDF activity and pass range as index, now you will get a text variable containing all text from the page - txtFromPDF
Check the presence of the keyword using contains method.
ie, like if txtFromPDF.contains(“keyword”)

If true, then your keyword is found at the page number index

Aoyama · January 9, 2020, 8:07am

Hi @shankm,
Thank you for your instruction. I will try it and will get back to you.

vinothkumar2905 · April 17, 2020, 5:48am

Hi, in my case my pdf contains more than 200 pages. I cant extract entire pages because it takes more time. can anyone help to solve this situation?

puneet.bansal21 · January 6, 2021, 7:13am

Hi Aoyama,
Please let me know if shankm solution helped or not for you. Its working for me.

sundalpathyREBORN · January 6, 2021, 7:16am

it is not workign for me

puneet.bansal21 · January 6, 2021, 7:50am

PDF_Page_Count_Extract_WithDynamicKeyword.xaml (10.0 KB)

please find attached workflow to find #PagesCount, Search Keyword and extract page from PDF

sundalpathyREBORN · January 8, 2021, 2:31am

bergwin

Topic		Replies	Views
Find pdf file name and page number, if it contains specific text RPA Discussions uiautomation , activities , studio , general	5	2657	October 18, 2022
I need to find the number of the page where a keyword is located in a pdf. Do you know any activity that gives me the total number of pages that a PDF file has? Help activities , studio	12	3189	October 29, 2019
Pdf automation solution Forum	5	796	May 22, 2023
Page number from pdf Activities pdf , question , pdf-extraction	6	906	July 19, 2023
Extract data in pdf based on keyword exits Activities pdf	4	250	December 6, 2023

Most Active Users - Yesterday
prashant1603765
sonaliaggarwal47
Yoichi
Anil_G
mively
lrtetala
pd2897
ashokkarale
Shawn_Gill
jast1631
More details...

Get specific page number based on keyword from PDF files

Related topics