Compare Names

Hi, I have this question:

I’ve got 2 columns that I need to compare to continue

Name on wepage | Name on system
Charlie Woods | Woods Charlie
John Stuart Bright | John Bright
Anne Carlton | Anne K. CARLTON
Tony Bennet | Lily Collins

I need to compare in the same row the names in both headers to continue. We know as humans that Charlie Woods and Woods Charlie is the same person, Anne Carlton and Anne K. CARLTON also.

How can I do this comparison to have a percentage of match between the names, to know that if 80% match I can continue…

Hi @KevinDS,

A while back I had a developed a Azure function which can take two strings and respond back with Levenshtein Distance of the two strings. Basically, how similar are the words or the letters or the combinations of letters and words in the two strings.

The problem comes under the realm of statistics, lingusitcs and computer science fields and is know as calculating the Edit Distance. In short, what is the lowest number of edits required on one of the strings such that it matches the string being compared. The lesser the number of edits, the closer the strings are to each other.

It is a very vast field in academia so you may find other algorithms which can perform the Levenshtein Distance. For example, the Jaro–Winkler distance Or Hamming Distance for strings with same lengths.

In this implementation, I used the FuzzyWuzz library in Python to make the azure functions.

You can use the HTTP Request activity (GET) in UiPath

URL syntax:
https://pyautomata.azurewebsites.net/api/stringmatcher?stringA=stringA=YOURFIRSTSTRING&stringB=YOURSECONDSTRING

Your example:
https://pyautomata.azurewebsites.net/api/stringmatcher?stringA=Charlie Woods&stringB=Woods Charlie

The output json will be:

{ "Input stringA": "Charlie Woods",
  "Input stringB": "Woods Charlie",
  "Ratio": {
    "Description": "Calculates Levenshtein distance similarity ratio of two input strings.",
    "Value": 54
  },
  "Partial_Ratio": {
    "Description": "Performs a substring matching by matching using the shortest string and recursively matching with all substrings.",
    "Value": 70
  },
  "Token_Sort_Ratio": {
    "Description": "Sorts the words in the input strings alphabetically and then calculates the Levenshtein distance similarity ratio of the two modified input strings.",
    "Value": 100
  },
  "Token_Set_Ratio": {
    "Description": "Considers common tokens in the two input strings and calculates the Levenshtein distance similarity ratio of the two modified input strings. Recommended to be used when the difference in length between two input strings is significant.",
    "Value": 100
  },
  "Documentation": {
    "Blog": "https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/",
    "GitHub": "https://github.com/seatgeek/fuzzywuzzy"
  }
}

Hope this clears things a bit.

2 Likes

@jeevith

Very informative

Hi @KevinDS

I have designed a simple workaround. I am attaching the workflow. Let me know if it works for you.

Compare Names.xaml (9.1 KB)

Regards
Varun Kumar

Thank you, this can come in very handy. I’ll just build on it a bit.

1 Like