How to find a word in a PDF as a Whole word

chauhan.rachita30 · May 2, 2024, 12:17am

I am trying to look for this term in a PDF: Intravenous, IV, IVs

The problem is its looking for IV in all the words like:
“relatIVe”, “positIVe”, “caregIVer”

and giving as the value found, but its not there as a Individual word.

How can I search like this, the way we find in MS word:

If there a script or something I can use.
Will really appreciate the help

KevinE · May 2, 2024, 1:07am

Use the Is Text Matching activity which allows you to search more specifically with regex

Your regex pattern for IV would be \bIV\b

chauhan.rachita30 · May 2, 2024, 2:10am

thank you. can you give the exact regex?

vrdabberu · May 2, 2024, 2:21am

Hi @chauhan.rachita30

Try this syntax in If:

System.Text.RegularExpressions.Regex.IsMatch(in_strPdfFullText,"\b(Intravenous|IV|IVs)\b")

Let me know if you have any queries.

Regards

KevinE · May 2, 2024, 2:22am

Your regex pattern for IV would be \bIV\b

chauhan.rachita30 · May 2, 2024, 3:47pm

what if if I have a variable that stores the value:

How to write the expression?

postwick · May 2, 2024, 4:00pm

Just put a space before and after it…

...Contains(" " + MFTerm + " " )

vrdabberu · May 2, 2024, 4:49pm

Hi @chauhan.rachita30

Can you tell what are you exactly trying to search. I will help you according to that.

Regards

chauhan.rachita30 · May 2, 2024, 4:49pm

how to add case- insensitive to the expression: For MFTerm

chauhan.rachita30 · May 2, 2024, 4:50pm

there are multiple terms.

vrdabberu · May 2, 2024, 4:51pm

Hi @chauhan.rachita30

Can you share the whole text and what are the words do you want to search from the text.

Regards

mukesh.singh · May 2, 2024, 7:38pm

Here’s a solution in UiPath to find a word as a whole word (not part of another word) in a PDF and provide multiple options for handling the results:

1. Using “Read PDF Text” Activity:

This approach extracts the entire PDF text and performs string manipulation to find whole words.

Steps:

 1. Drag a "Read PDF Text" activity to your workflow.
 2. Set the "File Path" property to the path of your PDF file.
 3. Create a variable of type `String` to store the extracted text.
 4. Connect the activity and assign the output (`Text`) to the string variable (e.g., `pdfText`).

Finding Whole Words:

 1. Use a "For Each" activity iterating over each character in the `pdfText` variable.
 2. Inside the loop:
    - Create another string variable to store the current word being built (e.g., `currentWord`).
    - Check if the current character is a letter (use `Char.IsLetter(pdfText[i])`) or an underscore (for hyphenated words).
      - If yes, append the character to `currentWord`.
    - Check if the current character is not a letter, underscore, or the end of the string:
      - If `currentWord` is not empty and matches your search word (use `currentWord.Equals("yourWord", StringComparison.Ordinal)`) for whole word match:
        - Add the word and its position (index `i` in the loop) to a list of results (e.g., `List<Tuple<string, int>> results = new List<Tuple<string, int>>();`).
        - Clear `currentWord`.

2. Using “PDF Text Scope” Activity (UiPath.PDF.Activities Package):

If available, this approach uses a dedicated activity to extract text with more control over whitespace handling.

Steps:

 1. Install the "UiPath.PDF.Activities" package (if not already installed).
 2. Drag a "PDF Text Scope" activity to your workflow.
 3. Set the "FilePath" property to the PDF file path.
 4. Inside the scope, use a "For Each" activity iterating over the extracted text elements (`TextElements`).
 5. Inside the loop:
    - Check if the current text element is a text node (`TextElement.Kind.Equals(TextElementKind.Text)`).
    - If yes, use string comparison methods like `Equals` (as in method 1) to find whole word matches in the `TextElement.Text` property.
    - Add the word and its location information (extracted from `TextElements`) to your results list.

3. Providing Multiple Options:

Once you have the list of whole word occurrences:
- Use an “If” activity to check if the list is empty:
  - If empty, display a message indicating the word was not found.
- If not empty, you have multiple options:
  - Use a “Write Line” activity to simply log the found words and their positions.
  - Create a message box displaying the words and their positions.
  - Use the results list to perform further actions, such as highlighting the words in the PDF (using external libraries or custom development).

Additional Considerations:

You can modify the string comparison logic (e.g., case-sensitive or insensitive) by adjusting the StringComparison parameter.
Consider error handling for potential exceptions during PDF reading or text extraction.

This approach provides a flexible solution for finding whole words in a PDF and handling results in various ways using UiPath. Choose the method that best aligns with your UiPath version, available packages, and desired outcome.

KevinE · May 2, 2024, 9:23pm

Use the UiPath activity Is Text Matching, and tick the box for case-insensitive.

KevinE · May 2, 2024, 9:28pm

This will only find matches surrounded by a space on each side. It won’t find matches if there is punctuation, new line, beginning or end of the string, etc.

system · May 5, 2024, 9:28pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I have a list of words in an excel and I need to search in a PDF how can I do Help excel , studio	1	903	February 13, 2020
Find a word in a PDF File Studio studio , question , project_panel	2	1357	April 26, 2022
Find and copy words from pdf file Help pdf , activities	1	3980	September 20, 2017
Extract certain key words from multiple pdfs Activities pdf , activities , question	8	913	February 8, 2022
Pdf search Studio studio , question , activities_panel	3	167	January 11, 2024

How to find a word in a PDF as a Whole word

Related topics