Fuzzy match in a PDF text

leonardo961900 · December 21, 2021, 4:00pm

Hi!
I’m facing this problem. I have a non-native PDF to read and i need to check if there are some words present.
Once I have extracted the PDF as a String I assigned a varible using the following code:
myvar=System.Text.RegularExpressions.Regex.IsMatch(pdf_string,matching_words,System.Text.RegularExpressions.RegexOptions.IgnoreCase).

In my particular case i Have to check if the string “A. Einstein, B. Poborsky” is present inside the PDF.

However the Read activity doesn’t read 100% accurately, for example the text “A. Einstfin, B. Pqborsky” so the match is “False”.

It is possible to add a confidence interval?
Thank you in advance

kadiravan_kalidoss · December 21, 2021, 5:34pm

Hi @leonardo961900,

Thanks!

leonardo961900 · December 21, 2021, 5:59pm

Hi, thanks. I’ve already seen the link you sent, the problem is that i’m new in UIPath and i don’t know nothing about Python.
I was searching something more direct without passing through Python.

dokumentor · December 21, 2021, 6:38pm

Hi @leonardo961900 you can work in two lines. One is applying fuzzy matching that involves integrating third party libraries like the one cited by @kadiravan_kalidoss. I don’t know if you are going to find a similar functionality in a native or “easy to use” activity, at least I haven’t.

On the other hand you can work to improve you OCR recognition. You have many options, and I recommend the following:

OmniPage is in my opinion the best option between the “included for free”. You have to install the corresponding package first.
Google Cloud Vision for photo like scan. You need an API Key that you can obtain with your Google Account. First 1000 images are for free then you’ll have to pay.
ABBYY FineReader for scan type document images. Requires subscription or paid license.

Hope it helps!

Topic		Replies	Views
How to build a Fuzzy/Partial Contains function? Help pdf , ocr , studio , string , question	5	3923	November 25, 2019
Regex Fuzzy logic? Activities activities , studio , system	6	1326	October 25, 2022
Need urgent help on Fuzzy Match of adress. Major problem here is i dont know the location in pdf file where is that address is located. (Note : Document can be of any type) Activities activities , question , document_understanding	2	1116	August 29, 2021
Doing Fuzzy Phrase search in a large text Activities mail , activities	1	1107	August 9, 2021
Matching a string based on similarity i.e. not exact match Help string	21	16665	May 17, 2023

Fuzzy match in a PDF text

Related topics