Hi there! I’m trying to do something where Levenshtein distance could help, UIpath have something like that?
Nothing built-in that I know of, but it shouldn’t be hard to build a custom code activity for it.
Hello, @carmen! I need an activity that computes Levenshtein distance. Did you make it?
Hi @Bernardo_Ferreira, I’m sorry I didn’t.
@Bernardo_Ferreira, if you can settle for what I think is Damerau-Levenshtein instead of Levenshtein, try this workflow below.
You’ll need a “Run Auto Hot Key Script”, so that means installing UiPath.Script.Activities if you don’t already have it.
The results of 'wordIn = “tset”, 'listIn = "“random,task,test,text,toast” are:
I think true Levenshtein would be:
but I may well have it backwards. Either way, if you fiddle with the script you can use either, I suppose.
I think if you want to pass in a collection as anything but a string you’ll have to convert this to an “Invoke Code” activity or “AHKActivities”.
There’s a link in the script “Fuzzy Test Function.ahk” to the AutoHotkey forum, or here, where I got the script that I modified for the workflow. The original “Fuzzy Test.ahk” is included also, as is the “LDistance.ahk” library itself.
Levenshtein.zip (4.3 KB)
EDIT: Okay, here’s two workflows, L3 is Damerau-Levenshtein, L4 is Levenshtein.
Levenshtein_and_Damerau-Levenshtein.zip (15.8 KB)
Levenshtein screenshot confirming my suspicions above:
Thank you @burque505. I will try to see if I can solve my issue with this.
Good luck with it, Bernardo. If it works for you please let me know. Thanks!
I got it working in my colleague’s computer, but when i tried to execute it in mine, I found out that I don’t have UiPath.Script.Activities. Can you help me installing it, please?
I got it working, just for future reference, here is the link for the UiPath.Script.Activities package:
If you need help installing a package, use this:
@Bernardo_Ferreira, glad to hear it!
You could also try this in the Code activity
Updated to reflect changes in code at the AHK link above.
Very interested in this, company content restrictions don’t let me download zip files though. Could you upload the .xamls separately please?
Thanks for the snippet. Would it be possible to set a sensitivity on it? i.e. 0.7, so we only see matches with less than 30% changes needed.
Or can you help with the code snippet that would limit the upper limit for distance? an int32 input from uipath so that calculation would be faster for bigger data sets. No reason for it to use CPU for very unlikely matches
Since I don’t know how you’ll be using this I’m kind of shooting in the dark, but try this lib and function script. You would pass Word, WordList, and maxdistance to the called AHK script (here, “Fuzzy Test Function DL_2.ahk”, which is Damerau-Levenshtein). I haven’t tried it on any large dataset, so YMMV. Here I’ve hard coded the vars. In UiPath, just call the function ScoreIt(Word, WordList) rather than executing the script itself.
#Include LDistance_2.ahk global Word := "DONALDBARACKGEORGE" global WordList := "random,task,test,text,toast" global maxdist := 2
Back to hard coded again:
#Include LDistance_2.ahk global Word := "UiPath" global WordList := "random,task,test,text,toast" global maxdist := 2
With a matching word:
#Include LDistance_2.ahk global Word := "test" global WordList := "random,task,test,text,toast" global maxdist := 2
In the accompanying lib, LDistance_2.ahk, I just commented out this line:
;maxdist := a.Length() + b.Length()
so “maxdist” can be passed in to the function.
Hope this helps, let me know.
Thanks for taking your time to help.
i will be using it to check names against PEP and EU sanction list. I have then calculated the threshold equal to 70% match, and uses this as a filter once the fuzzy script has been done.
I have my search name and have converted PEP and EU sanction list into a long array of strings. This works fine.It throughly checks every entitity in my lists (1000+). Some of them are very long and returns scores of 22+, which is not needed. I then split the result from fuzzy into a datatable, where i apply my previously calculated threshold, so I only have the results with 70% match or higher. The calculation takes approx 3 minutes now, but most results are useless.
In the examples you provided it is still calculating 11-task, so did the maxdist change anything?
I want it to stop calculating the current word and move on if it passes the limit. It also seems that your attachment is unavailable.
I will then be passing the calculated threshold (a rounded down number) along as a input, just like word and worldlist in uipath, as it varies depending on the search word.
We will also be doing a fuzzy search into a database (with 10000+ entities) and it is for this part that the distance limit (equal to the in_argument of the calculated threshold) could save time i hope.
I re-added the attachment - for the moment it appears to be available. If you can share your workflow, maybe I can help, but at the moment I just don’t have enough information. You can try varying maxdist in the AHK script, and if that doesn’t help, we’ll figure out something else.