Levenshtein distance

Levenshtein-Damerau.zip (120.4 KB)

I have attached the workflow file, please look at Levenshtein-Damerau.xaml. I edited the Fuzzy test function LD.ahk to use func%DLDistance, as it was just showing normal Levenshtein results. With DLD it uses the correct Levenhstein - damerau.

Most of the actions are transformation of data from arrays, to string, to dt, etc.

I have reduced the end product by applying the threshold, but it still does a lot of unnecessary computing.

Did you perhaps not see my attachment with Fuzzy Test Function DL_2.ahk and LDistance_2.ahk? I just checked, they are in there.
Regards,
burque505

Hi Burque505, yeah i see it now i think in the previously version it was not. Do you have a solution on how to limit the computing for huge wordlists?

Hi MarkusDS, just found this today on the AHK forum: Dice’s Coefficient. Maybe let me know if you can adapt that? I’ll be happy to help if I can.
Regards,
burque505

Hi Burque,

I have looked at the thread and it seems interesting. Do you think it will improve performance radically?

I saved the Export.ahk in the uipath folder, changed the AHK action to function “findBestMatch”, but uipath throws exception saying the function doesn’t exist. I haven’t worked with AHK before seeing this thread.

Can you help make it work? I have attached my modified xaml file (dice) Levenshtein-Damerau (2).zip (126.2 KB)

@MarkusDS .findBestMatch() is a method of a class. I believe you were importing a function before. Classes work a little differently. You’ll have to create an instance of the class before it can be used.

ssObj := New stringsimilarity()
ssObj.findBestMatch("argument1",["argument2","argument2"])

I altered the output of your test app slightly and coded how I would do it here (also dropping all strings with less than 70% match): ahkbin!
Your WordList variable doesn’t have an example that matches “tset” closely so I added one.

As for performance, I wrote the class and am always performance-oriented. In my tests involving coincidentally, 10000 loops of the method resulted in about 450ms of CPU time, though you may experience slightly longer if feeding longer strings like names into the method. Good luck!

Hi Chunjee,

This seems very promising.
It sounds very good with 10000 loops in 450ms, but I cant seem to get it to work in Uipath. I have updated the AHK with your code, but I still can’t call the function (obviously like you said).
This is from the original code where i can easily call the function “LDistance”
image
and for some reason this is not working?


My collections contains the search string and my string array.

.findBestMatch() won’t work as a standalone function if that is what you tried.

Result := dice(Word,WordList)
msgbox % Result
dice(para_string,para_array) {
    ssObj := New stringsimilarity()
    results := ssObj.findBestMatch(para_string,StrSplit(para_array,","))
    output := ""
    For Key, Value in results.ratings {
        if (Value.rating > 0.70) {
            output := Value.rating "-" Value.target "`n" output
        }
    }
    return, % output
}

if you can collab uipath projects with other users, please do so. me: UiPath

So how do you use it in uipath to make a dynamic search? I want to pass the word and wordlist as inputs @chunjee

Sorry, new users can not upload attachments.

I had to delete some steps because I didn’t want to decipher everything it is trying to accomplish. The output of Dice.ahk each line will currently be {{score}}-{{string}} which I think MIGHT match what you had before. Seems like this needs a little more thought because the Excel output is not immediately understandable to me.

I wrapped the class in a function called “dice” which I think UiPath wants defined in FunctionName.
Also Result.Split(Environment.NewLine.ToArray,StringSplitOptions.RemoveEmptyEntries) seems like the correct way to split by newline, you had some weird character workaround there.

2 Likes

@Chunjee, thanks for stepping in! This will really help me also.
Regards,
burque505

Edit: Some luck maybe. See attached archive. Definitely getting closer. It is REALLY fast.
Edit: @MarkusDS, please take a look at the attached archive and let us know if it works. Thanks!

UIPATH_Distance_mod.zip (194.9 KB)

projectLevenhstein.zip (2.9 KB)

I tried to look for alternatives and found this one. This is only Levenhstein, but it is incredible fast and works very easily. not needing conversions.

The original project was ~3 minutes to compute this one is done in one second. Only problem now is to convert it to Levenhstein-damerau

1 Like

Thanks, Markus, for sharing that. If you will try the UIPATH_Distance_mod.zip (it’s not the same as Chunjee’s, it’s modified to accept your variables. I tried it with your .xlsx files) in the post just above yours, that also takes about one second. Probably needs tweaking for your desired results, though.

Regards,
burque505

dices

@MarkusDS, I tried to use your workflow but some information is missing. Regarding Damerau-Levenshtein, maybe you could adapt this VB.Net code to your “Invoke Code” activity.

Public Shared Function EditDistance(ByVal original As String, ByVal modified As String) As Integer
Dim len_orig As Integer = original.Length
Dim len_diff As Integer = modified.Length
Dim matrix = New Integer(len_orig + 1 - 1, len_diff + 1 - 1) {}

For i As Integer = 0 To len_orig
    matrix(i, 0) = i
Next

For j As Integer = 0 To len_diff
    matrix(0, j) = j
Next

For i As Integer = 1 To len_orig

    For j As Integer = 1 To len_diff
        Dim cost As Integer = If(modified(j - 1) = original(i - 1), 0, 1)
        Dim vals = New Integer() {matrix(i - 1, j) + 1, matrix(i, j - 1) + 1, matrix(i - 1, j - 1) + cost}
        matrix(i, j) = vals.Min()
        If i > 1 AndAlso j > 1 AndAlso original(i - 1) = modified(j - 2) AndAlso original(i - 2) = modified(j - 1) Then matrix(i, j) = Math.Min(matrix(i, j), matrix(i - 2, j - 2) + cost)
    Next
Next

Return matrix(len_orig, len_diff)

End Function

I just Telerik’s Code Converter on @Cosin’s link above.

Yeah it doesnt seem to work with that converted code. It seems a descent compromise. Using the invoke code makes it easier to do it directly

I belatedly found this UiPath Go! link for Levenshtein distance.

Hi, it’s very coincidental. I am also in the middle of a project with the same requirement. Checking names against a bunch of sanctions lists from different regions. How did it go for you?

Any tips or suggestions on best practices? I would like to connect with you if you are ok.