I have attached the workflow file, please look at Levenshtein-Damerau.xaml. I edited the Fuzzy test function LD.ahk to use func%DLDistance, as it was just showing normal Levenshtein results. With DLD it uses the correct Levenhstein - damerau.
Most of the actions are transformation of data from arrays, to string, to dt, etc.
I have reduced the end product by applying the threshold, but it still does a lot of unnecessary computing.
Hi MarkusDS, just found this today on the AHK forum: Dice’s Coefficient. Maybe let me know if you can adapt that? I’ll be happy to help if I can.
Regards,
burque505
I have looked at the thread and it seems interesting. Do you think it will improve performance radically?
I saved the Export.ahk in the uipath folder, changed the AHK action to function “findBestMatch”, but uipath throws exception saying the function doesn’t exist. I haven’t worked with AHK before seeing this thread.
Can you help make it work? I have attached my modified xaml file (dice) Levenshtein-Damerau (2).zip (126.2 KB)
@MarkusDS .findBestMatch() is a method of a class. I believe you were importing a function before. Classes work a little differently. You’ll have to create an instance of the class before it can be used.
ssObj := New stringsimilarity()
ssObj.findBestMatch("argument1",["argument2","argument2"])
I altered the output of your test app slightly and coded how I would do it here (also dropping all strings with less than 70% match): ahkbin!
Your WordList variable doesn’t have an example that matches “tset” closely so I added one.
As for performance, I wrote the class and am always performance-oriented. In my tests involving coincidentally, 10000 loops of the method resulted in about 450ms of CPU time, though you may experience slightly longer if feeding longer strings like names into the method. Good luck!
This seems very promising.
It sounds very good with 10000 loops in 450ms, but I cant seem to get it to work in Uipath. I have updated the AHK with your code, but I still can’t call the function (obviously like you said).
This is from the original code where i can easily call the function “LDistance”
and for some reason this is not working?
I had to delete some steps because I didn’t want to decipher everything it is trying to accomplish. The output of Dice.ahk each line will currently be {{score}}-{{string}} which I think MIGHT match what you had before. Seems like this needs a little more thought because the Excel output is not immediately understandable to me.
I wrapped the class in a function called “dice” which I think UiPath wants defined in FunctionName.
Also Result.Split(Environment.NewLine.ToArray,StringSplitOptions.RemoveEmptyEntries) seems like the correct way to split by newline, you had some weird character workaround there.
@Chunjee, thanks for stepping in! This will really help me also.
Regards,
burque505
Edit: Some luck maybe. See attached archive. Definitely getting closer. It is REALLY fast.
Edit: @MarkusDS, please take a look at the attached archive and let us know if it works. Thanks!
I tried to look for alternatives and found this one. This is only Levenhstein, but it is incredible fast and works very easily. not needing conversions.
The original project was ~3 minutes to compute this one is done in one second. Only problem now is to convert it to Levenhstein-damerau
Thanks, Markus, for sharing that. If you will try the UIPATH_Distance_mod.zip (it’s not the same as Chunjee’s, it’s modified to accept your variables. I tried it with your .xlsx files) in the post just above yours, that also takes about one second. Probably needs tweaking for your desired results, though.
@MarkusDS, I tried to use your workflow but some information is missing. Regarding Damerau-Levenshtein, maybe you could adapt this VB.Net code to your “Invoke Code” activity.
Public Shared Function EditDistance(ByVal original As String, ByVal modified As String) As Integer
Dim len_orig As Integer = original.Length
Dim len_diff As Integer = modified.Length
Dim matrix = New Integer(len_orig + 1 - 1, len_diff + 1 - 1) {}
For i As Integer = 0 To len_orig
matrix(i, 0) = i
Next
For j As Integer = 0 To len_diff
matrix(0, j) = j
Next
For i As Integer = 1 To len_orig
For j As Integer = 1 To len_diff
Dim cost As Integer = If(modified(j - 1) = original(i - 1), 0, 1)
Dim vals = New Integer() {matrix(i - 1, j) + 1, matrix(i, j - 1) + 1, matrix(i - 1, j - 1) + cost}
matrix(i, j) = vals.Min()
If i > 1 AndAlso j > 1 AndAlso original(i - 1) = modified(j - 2) AndAlso original(i - 2) = modified(j - 1) Then matrix(i, j) = Math.Min(matrix(i, j), matrix(i - 2, j - 2) + cost)
Next
Next
Return matrix(len_orig, len_diff)
End Function
I just Telerik’s Code Converter on @Cosin’s link above.
Hi, it’s very coincidental. I am also in the middle of a project with the same requirement. Checking names against a bunch of sanctions lists from different regions. How did it go for you?
Any tips or suggestions on best practices? I would like to connect with you if you are ok.