Finding a word in a document, extracting the sentence it's in

Short · August 19, 2022, 1:57pm

Hi all

This may be a bit of a long shot but I wondered if anyone had any ideas on how to do the process someone has asked me to look into…

Step 1 - see if words from a list of words appears in a PDF document
Step 2 - if it does, either

A) grabs the page number of the page it appears on then carries on searching through the document
or
B) grabs the whole sentence it appears in (this may not be the best of the two options in case something important is missed in the next sentence)

Step 3 - enter the information obtained (page number or sentence) in an Excel table

e.g. - if the word was “Contractor”, it would find it and write back to Excel that the word was found on page number 10:

The document will be anywhere from 50 to 300 pages, they’ll be sent from different companies so there’s no set template, the word may or may not appear in a table (as shown above).

I can manage step 1 and 3 but I have no idea with step 2. Any ideas please?

Thanks

postwick · August 19, 2022, 3:43pm

Read the text into a variable and then use RegEx to find the word and extract everything between the previous and next periods.

MarinAlexandru · August 19, 2022, 3:48pm

Hello,

I would solve this problem with a couple of lines in Python but I think I have found a way in UiPath also…

Basically you get the number of pages of the pdf doc, and loop through them, reading each page one by one ( by using the Range property of the ReadPDF Text activity and checking if the word is in that page. From there you can add the page number to a list, print it to console, etc.

I tested it on a 422 page document.

Short · August 22, 2022, 10:45am

Amazing, thank you so much!

system · August 25, 2022, 10:45am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pdf automation solution Forum	5	809	May 22, 2023
Extract key term and Identify the page number it is on Help pdf , activities , question	3	1489	November 26, 2019
I have a list of words in an excel and I need to search in a PDF how can I do Help excel , studio	1	902	February 13, 2020
Input a text and search that text from multiple pdf's stored in a folder, the output must show the pdf file name and page in which the searched text is stored StudioX studiox , question	2	1130	October 14, 2022
Page number from pdf Activities pdf , question , pdf-extraction	6	916	July 19, 2023

Finding a word in a document, extracting the sentence it's in

Related topics