How to find a word in a PDF as a Whole word

I am trying to look for this term in a PDF: Intravenous, IV, IVs
image

The problem is its looking for IV in all the words like:
“relatIVe”, “positIVe”, “caregIVer”

and giving as the value found, but its not there as a Individual word.

How can I search like this, the way we find in MS word:

If there a script or something I can use.
Will really appreciate the help

Use the Is Text Matching activity which allows you to search more specifically with regex

Your regex pattern for IV would be \bIV\b

thank you. can you give the exact regex?

Hi @chauhan.rachita30

Try this syntax in If:

System.Text.RegularExpressions.Regex.IsMatch(in_strPdfFullText,"\b(Intravenous|IV|IVs)\b")

Let me know if you have any queries.

Regards

1 Like

Your regex pattern for IV would be \bIV\b

what if if I have a variable that stores the value:
image

How to write the expression?

Just put a space before and after it…

...Contains(" " + MFTerm + " " )

Hi @chauhan.rachita30

Can you tell what are you exactly trying to search. I will help you according to that.

Regards

how to add case- insensitive to the expression: For MFTerm
image

there are multiple terms.

Hi @chauhan.rachita30

Can you share the whole text and what are the words do you want to search from the text.

Regards

Here’s a solution in UiPath to find a word as a whole word (not part of another word) in a PDF and provide multiple options for handling the results:

1. Using “Read PDF Text” Activity:

  • This approach extracts the entire PDF text and performs string manipulation to find whole words.

Steps:

 1. Drag a "Read PDF Text" activity to your workflow.
 2. Set the "File Path" property to the path of your PDF file.
 3. Create a variable of type `String` to store the extracted text.
 4. Connect the activity and assign the output (`Text`) to the string variable (e.g., `pdfText`).

Finding Whole Words:

 1. Use a "For Each" activity iterating over each character in the `pdfText` variable.
 2. Inside the loop:
    - Create another string variable to store the current word being built (e.g., `currentWord`).
    - Check if the current character is a letter (use `Char.IsLetter(pdfText[i])`) or an underscore (for hyphenated words).
      - If yes, append the character to `currentWord`.
    - Check if the current character is not a letter, underscore, or the end of the string:
      - If `currentWord` is not empty and matches your search word (use `currentWord.Equals("yourWord", StringComparison.Ordinal)`) for whole word match:
        - Add the word and its position (index `i` in the loop) to a list of results (e.g., `List<Tuple<string, int>> results = new List<Tuple<string, int>>();`).
        - Clear `currentWord`.

2. Using “PDF Text Scope” Activity (UiPath.PDF.Activities Package):

  • If available, this approach uses a dedicated activity to extract text with more control over whitespace handling.

Steps:

 1. Install the "UiPath.PDF.Activities" package (if not already installed).
 2. Drag a "PDF Text Scope" activity to your workflow.
 3. Set the "FilePath" property to the PDF file path.
 4. Inside the scope, use a "For Each" activity iterating over the extracted text elements (`TextElements`).
 5. Inside the loop:
    - Check if the current text element is a text node (`TextElement.Kind.Equals(TextElementKind.Text)`).
    - If yes, use string comparison methods like `Equals` (as in method 1) to find whole word matches in the `TextElement.Text` property.
    - Add the word and its location information (extracted from `TextElements`) to your results list.

3. Providing Multiple Options:

  • Once you have the list of whole word occurrences:
    • Use an “If” activity to check if the list is empty:
      • If empty, display a message indicating the word was not found.
    • If not empty, you have multiple options:
      • Use a “Write Line” activity to simply log the found words and their positions.
      • Create a message box displaying the words and their positions.
      • Use the results list to perform further actions, such as highlighting the words in the PDF (using external libraries or custom development).

Additional Considerations:

  • You can modify the string comparison logic (e.g., case-sensitive or insensitive) by adjusting the StringComparison parameter.
  • Consider error handling for potential exceptions during PDF reading or text extraction.

This approach provides a flexible solution for finding whole words in a PDF and handling results in various ways using UiPath. Choose the method that best aligns with your UiPath version, available packages, and desired outcome.

Use the UiPath activity Is Text Matching, and tick the box for case-insensitive.

This will only find matches surrounded by a space on each side. It won’t find matches if there is punctuation, new line, beginning or end of the string, etc.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.