How to build a Fuzzy/Partial Contains function?

Hello, I am scraping text from scanned PDFs using OCR then I’d like to search that text output if it contains a specific term. I can use the contains function like so:

VarPdfOutput.Contains(VarSpecificTerm)

Then it will return TRUE or FALSE. However, the contains requires an exact match, but I need it to be a partial match (at least above 80% match) because the OCR does make mistakes with scanned documents.

In my case the VarSpecificTerm is usually multiple words and not a single word.

Here is an example of what I’d like to do:

VarPdfOutput = “This is an example of some text that has been scraped from a pdf”

VarSpecificTerm = “has been scraped”

If we used VarPdfOutput.Contains(VarSpecificTerm) it will return TRUE.

Now lets say the OCR has made some mistakes in the VarPdfOutput like so:

VarPdfOutput = “Thos in an example of S0me texL that has beon scrapen fron a pdg”

Now if we used VarPdfOutput.Contains(VarSpecificTerm) it will return FALSE, however as a human we know that a few scraping errors were made in “has beon scrapen” but in the original scanned PDF it says “has been scraped” so I’d want it to return TRUE.

How can I build this fuzzy or partial contains function like so:

VarPdfOutput.FuzzyContains(VarSpecificTerm)

Then it will return TRUE.

It doesn’t necessarily have to be a function like the “FuzzyContains” example above, but how can I build this fuzzy contains functionality within UiPath? Or is there perhaps an activity for this?

2 Likes

I have done this in Excel if that is any help to you? Its very easy to implement.

The first response to this post provides a custom fuzzy function. These are customised versions of the kind of functions, e.g. VLOOKUP, that are already in excel.

  1. Alt-F11

  2. Insert / Module

  3. Paste above code into code window

Once you’ve added this to your workbook, call it within your workbook and you can get what you need everytime through your formula. You can automate this using UiPath. I could work an example workflow for you given time. You could probably paste the value into a pre made excel template containing the custom fuzzy function and then grab the result

1 Like

Thanks for your reply @ronanpeter does this account for spelling errors within the words?

Yes. The custom fuzzy logic function will return a positive result to the confidence level you set between 0 and 1 (0.8 in your case). I have used it exactly as shown in the forum post I linked and it worked a treat.

The only amendment you may have to make is that you want to use a substring so I don’t know how that impacts the calculation of comparison to the confidence level. That is where is gets very complex and I would say you would not find a way to adapt the excel code I linked to accommodate it.

For me, Python would be the way to go here if the substring fuzzy logic is needed. There are packages which do this for you in a few lines of code:

And UiPath integrates nicely with Python. You could write a simple program to take the string variable and the substring (with mistakes) to match. It would output the confidence % back to UiPath and you could work from there.

I will try and get a workflow to show you this if I can get time as its an interesting one.

1 Like

Thank you so much, used this python code and works perfectly:

from fuzzywuzzy import fuzz
def FuzzyContains(Str1, Str2):
    Token_Set_Ratio = fuzz.token_set_ratio(Str1,Str2)
    return Token_Set_Ratio

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.