How To Check if String Exists in PDF File?

Dears,

How to check if a given string exists in PDF file ?
In fact I have a list of words in excel sheet need to check all of them one by one if they exist in PDF file and after that I need to write the result to same excel Workbook/Sheet.

Thanks in Advance

Hi @hsendel,
I suggest you to first read your PDF using a OCR engine, after you get the plain text, you can use string functions and check each word of the list if it is contained in the plain text from the PDF.

1 Like

Hello hsendel,

→ First try to read your pdf by using - UiPath.PDF.Activities. (if your pdf is a scan/ picture you have a ocr activity too)
→ Your output will be a string. Try somthing like this:
For each row in dtexcel
- if pdfstring.contain (row(wordtocapture))
(you can use regex instead of contain)

Tell me if is it not clear and i will upload a xaml ^^

3 Likes

Thanks @GersonTun & @Elya , I have different approach rather than finding string in one PDF page, my file contains many pages, so to search that I need to use “Search” option in PDF file instead of OCR.

1 Like

@hsendel - what you are planning to do when you find the string on a particular page?

1 Like

@hsendel actually the Read PDF text activity, reads all the pages at once, not page by page.

2 Likes

Hello @prasath17 , I will just confirm in excel sheet that this string exists

@GersonTun , Any limitation on number of pages ?

@hsendel - Please take a look at this post…this will help you…Here I have look for a string and if that string exist I will remove that page…You ignore that portion and use the others…

Only thing you have to do here is, instead of hardcoded text you have to pass your datatable row…thats it…

1 Like

Thanks @prasath17 , I will check and revert back to you.

@prasath17 , This is exactly what I want, but is it possible to add a way to show in which pages this needed string exists after Match Activity?

@hsendel - yes very simple…

Index + 1 is the page # you are reading. So when you get hit just print the index + 1…thats all

1 Like

never go for OCR unless you don’t have a choice

Check this post please…

1 Like

Awsome !!! @prasath17 , Possible to add the number of occurrences of searched string?

@hsendel …yes possible.

In the above post I am looking for word certificate using regex Right…you can use the same logic with some tweak as below…

 IntCount = system.text.regularexpressions.regex.matches(yourstring,regex pattern).count
2 Likes

Great JOB @prasath17 . Topic can be Closed :+1:

1 Like

Hello @prasath17 , Finally not able to get the Total number of occurrences, seems something is missing, Could you please advice ?

Note : I have removed “Solution” Improvement just to avoid closing the Topic, will do it again latter on.

@hsendel - where you are giving this ? Please show me your assign activity…?

If you using this in an assign statement…Please try as shown below…

image

Note: In your case, Just remove the .value…

1 Like

Thanks @prasath17 , I’ll check and revert back to you.