How to find the repeated word in the PDF

Hello All,
Need your support on the PDF automation.Below is the scenario.
Extract the repeated word in the PDF and count how many time the word repeated.

1 Like

Hi @Kabeer

Add UiPath.PDF.Activities to your project

Declare the five variables:

MyWord (a String for the word you’re looking for)
Pattern (a String for the pattern used in the regex for matching your word).
DocumentText (a String for the text extracted from the PDF)
WordMatches (a MatchCollection for the regex result)
WordCount (an Integer for your answer)

Use “Read PDF Text” Activity to extract the text from the PDF -> DocumentText

Use “Assign” Activity to assign to Pattern that will match your word:
String.Format("\b{0}\b", MyWord)

Use “Assign” Activity to assign to WordMatches the regex looking for the word into the text:
System.Text.RegularExpressions.Regex.Matches(DocumentText, Pattern)

Use “Assign” Activity to WordCount the matches count:
WordMatches.Count

1 Like

Hi @msan

Read pdf text with ocr and pass the string variable to matches activity and the pattern will be (?<=your word).*

Thanks
Ashwin.S

@ Kabeer
refer
https://www.rexegg.com/regex-quickstart.html
to have more insight on Regex.

What word do you need to count? You can use Regex Match activity, I could help you if you tell me what is the word

Hello msan,
Thanks for the instruction given. If you have any sample project for this scenario. Please share with me to get more clarity.

Hi,
For ex: In the PDF we have 4 pages in all the pages.we need to pick the given word and count the same.
Like : input word - uipath — need to check in all the pages.
Output - The word repeated 4 times

  1. Use PDF Read activity to read the PDF file, put the result in String
  2. Use Regex Matches activity to get the desired output
  3. Use RegexResult.Count to count

@Kabeer

Here it is

CountWordInPdf.xaml (7.4 KB)

Thanks a lot msan