Hi @KevinDS,
A while back I had a developed a Azure function which can take two strings and respond back with Levenshtein Distance of the two strings. Basically, how similar are the words or the letters or the combinations of letters and words in the two strings.
The problem comes under the realm of statistics, lingusitcs and computer science fields and is know as calculating the Edit Distance. In short, what is the lowest number of edits required on one of the strings such that it matches the string being compared. The lesser the number of edits, the closer the strings are to each other.
It is a very vast field in academia so you may find other algorithms which can perform the Levenshtein Distance. For example, the Jaro–Winkler distance Or Hamming Distance for strings with same lengths.
In this implementation, I used the FuzzyWuzz library in Python to make the azure functions.
You can use the HTTP Request activity (GET) in UiPath
URL syntax:
https://pyautomata.azurewebsites.net/api/stringmatcher?stringA=stringA=YOURFIRSTSTRING&stringB=YOURSECONDSTRING
Your example:
https://pyautomata.azurewebsites.net/api/stringmatcher?stringA=Charlie Woods&stringB=Woods Charlie
The output json will be:
{ "Input stringA": "Charlie Woods",
"Input stringB": "Woods Charlie",
"Ratio": {
"Description": "Calculates Levenshtein distance similarity ratio of two input strings.",
"Value": 54
},
"Partial_Ratio": {
"Description": "Performs a substring matching by matching using the shortest string and recursively matching with all substrings.",
"Value": 70
},
"Token_Sort_Ratio": {
"Description": "Sorts the words in the input strings alphabetically and then calculates the Levenshtein distance similarity ratio of the two modified input strings.",
"Value": 100
},
"Token_Set_Ratio": {
"Description": "Considers common tokens in the two input strings and calculates the Levenshtein distance similarity ratio of the two modified input strings. Recommended to be used when the difference in length between two input strings is significant.",
"Value": 100
},
"Documentation": {
"Blog": "https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/",
"GitHub": "https://github.com/seatgeek/fuzzywuzzy"
}
}
Hope this clears things a bit.