Here’s a solution in UiPath to find a word as a whole word (not part of another word) in a PDF and provide multiple options for handling the results:
1. Using “Read PDF Text” Activity:
This approach extracts the entire PDF text and performs string manipulation to find whole words.
Steps:
1. Drag a "Read PDF Text" activity to your workflow.
2. Set the "File Path" property to the path of your PDF file.
3. Create a variable of type `String` to store the extracted text.
4. Connect the activity and assign the output (`Text`) to the string variable (e.g., `pdfText`).
Finding Whole Words:
1. Use a "For Each" activity iterating over each character in the `pdfText` variable.
2. Inside the loop:
- Create another string variable to store the current word being built (e.g., `currentWord`).
- Check if the current character is a letter (use `Char.IsLetter(pdfText[i])`) or an underscore (for hyphenated words).
- If yes, append the character to `currentWord`.
- Check if the current character is not a letter, underscore, or the end of the string:
- If `currentWord` is not empty and matches your search word (use `currentWord.Equals("yourWord", StringComparison.Ordinal)`) for whole word match:
- Add the word and its position (index `i` in the loop) to a list of results (e.g., `List<Tuple<string, int>> results = new List<Tuple<string, int>>();`).
- Clear `currentWord`.
2. Using “PDF Text Scope” Activity (UiPath.PDF.Activities Package):
If available, this approach uses a dedicated activity to extract text with more control over whitespace handling.
Steps:
1. Install the "UiPath.PDF.Activities" package (if not already installed).
2. Drag a "PDF Text Scope" activity to your workflow.
3. Set the "FilePath" property to the PDF file path.
4. Inside the scope, use a "For Each" activity iterating over the extracted text elements (`TextElements`).
5. Inside the loop:
- Check if the current text element is a text node (`TextElement.Kind.Equals(TextElementKind.Text)`).
- If yes, use string comparison methods like `Equals` (as in method 1) to find whole word matches in the `TextElement.Text` property.
- Add the word and its location information (extracted from `TextElements`) to your results list.
3. Providing Multiple Options:
Once you have the list of whole word occurrences:
Use an “If” activity to check if the list is empty:
If empty, display a message indicating the word was not found.
If not empty, you have multiple options:
Use a “Write Line” activity to simply log the found words and their positions.
Create a message box displaying the words and their positions.
Use the results list to perform further actions, such as highlighting the words in the PDF (using external libraries or custom development).
Additional Considerations:
You can modify the string comparison logic (e.g., case-sensitive or insensitive) by adjusting the StringComparison parameter.
Consider error handling for potential exceptions during PDF reading or text extraction.
This approach provides a flexible solution for finding whole words in a PDF and handling results in various ways using UiPath. Choose the method that best aligns with your UiPath version, available packages, and desired outcome.
This will only find matches surrounded by a space on each side. It won’t find matches if there is punctuation, new line, beginning or end of the string, etc.