Find string partial simmilarity percentage

amithvs · February 21, 2023, 2:22pm

I am trying to find similarity percentage of two strings (first name and last name), like a matrix which will give us an info on how similar two strings are.
I used 2 approaches:

Levenshtein distance - Not that preferred as results were not that great when compared with results when manually done (eyeballed)
Sequence matcher method in difflib python library - This is so far the best one I could find. This will give me similarity percentage for most of the cases but the percentage will become lower if the order of first name and last names is jumbled.

For eg:

John Smith vs John Smith = 100%
but
John Smith vs Smith John = 50%

But both can be the same person, is there any way this can be achieved in some method? Also, the number of words in names is not fixed, some can have first name+second name+third names, etc…

Anyway, we could identify or give a better similarity percentage if the order of the provided names is reversed? I’m open to any other methods, or techniques.

Some data samples are as below, ideally, string 1 and string 2 can be the same, but how can it be achieved?

String 1	String 2
will smith	smith will
Christian Max payne	Payne Max Christian
John Max William Defoe	William Defoe John Max

arivu96 · February 21, 2023, 2:26pm

Hi @amithvs ,

check if exact match strVal1.ToLower()=strVal2.ToLower()
if not match strVal1.ToLower().Contains(strVal2.ToLower()) or strVal2.ToLower().Contains(strVal1.ToLower())
Split the string using space and check all contains
strVal1.ToLower().Split(" "c) and check all the values are matching with strVal2.

Regards,
Arivu

supermanPunch · February 21, 2023, 3:05pm

Hi @amithvs ,

Check the below post on obtaining the similarity using a .Net package. Other methods are also proposed in the thread. Let us know if you find it helpful.

arivu96 · February 21, 2023, 3:14pm

Hi @amithvs ,

Please follow the @supermanPunch shared post. its easy to compare and get the percentage match

Regards,
Arivu

amithvs · February 21, 2023, 3:21pm

Thanks I will try and check whether the results are better compared with the Python library I was using.

amithvs · February 21, 2023, 3:23pm

Hi, I can get the similarity percentage, my problem statement is to get a better similarity percentage if the names are jumbled. I will give it a shot. Thanks

amithvs · February 22, 2023, 8:29am

I have tried this in my case and these are the results from different algorithms. This was a wonderful learning experience for me. Much appreciated for sharing this.

Is there anyway we can improve the results? Like, I am looking for a perfect threshold value which can catch if names are reversed or in different order. For eg: I can built the code like anything above 80% are perfect partial match of the names.

Im open for options other than RPA. Any machine learning or python activities?

Topic		Replies	Views
Comparing two strings and display the percentage & highlight word matches StudioX string , studiox , question	3	175	February 15, 2024
Fuzzy String Comparison Help	7	4986	November 28, 2018
Compare Names Studio studio	9	2046	April 17, 2023
Matching Partial Name Matches from two different sources Help	1	2729	July 10, 2019
How to compare two strings and get its percentage Studio studio , question , activities_panel	4	3004	October 19, 2022

Most Active Users - Yesterday
ashokkarale
Anil_G
Yoichi
yangyq10
postwick
chandreshsinh.jadeja
aravindbalineni123
Parvathy
aya
PRASHANT_GABHANE
More details...

Find string partial simmilarity percentage

Related Topics