Fuzzy match in a PDF text

Hi!
I’m facing this problem. I have a non-native PDF to read and i need to check if there are some words present.
Once I have extracted the PDF as a String I assigned a varible using the following code:
myvar=System.Text.RegularExpressions.Regex.IsMatch(pdf_string,matching_words,System.Text.RegularExpressions.RegexOptions.IgnoreCase).

In my particular case i Have to check if the string “A. Einstein, B. Poborsky” is present inside the PDF.

However the Read activity doesn’t read 100% accurately, for example the text “A. Einstfin, B. Pqborsky” so the match is “False”.

It is possible to add a confidence interval?
Thank you in advance

Hi @leonardo961900,

Thanks!

Hi, thanks. I’ve already seen the link you sent, the problem is that i’m new in UIPath and i don’t know nothing about Python.
I was searching something more direct without passing through Python.

Hi @leonardo961900 you can work in two lines. One is applying fuzzy matching that involves integrating third party libraries like the one cited by @kadiravan_kalidoss. I don’t know if you are going to find a similar functionality in a native or “easy to use” activity, at least I haven’t.

On the other hand you can work to improve you OCR recognition. You have many options, and I recommend the following:

  • OmniPage is in my opinion the best option between the “included for free”. You have to install the corresponding package first.
  • Google Cloud Vision for photo like scan. You need an API Key that you can obtain with your Google Account. First 1000 images are for free then you’ll have to pay.
  • ABBYY FineReader for scan type document images. Requires subscription or paid license.

Hope it helps!

1 Like