Fuzzy String Comparison


#1

I need to compare two strings that are similar but not exactly the same.Their might be some additional words in one string or even the words might be interchanged in some places although they represent the same.

For Example:

  1. Google and Google Pvt. Ltd
  2. Computer Generated Solutions and Computer Generated Soltns
  3. Societe Generale and Generale Societe

I need something that can recognize these as the same.
Levenstein Algo doesn’t work beacuse the words need to be in order to work.

Thanks
Raviteja


#2

@Raviteja94 compare two string using contains in the if condition


#3

@indra It won’t work for 2nd and 3rd scenarios.


#4

Hi @Raviteja94,

Refer this link

http://www.dotnetworld.in/2013/05/c-find-similarity-between-two-strings.html?m=1

Based on the percentage you can conclude the match string

Similarity in % between Strings
public static void SmilarityinPercentage()
 {
   string string1 = "Manish";
   string string2 = "Mahesh";
   char[] charString1 = string1.ToCharArray();
   char[] charString2 = string2.ToCharArray();
   var strCommon = charString1.Intersect(charString2);
   //Formula : Similarity (%) = 100 * (CommonItems * 2) / (Length of String1 + Length of String2)
   double Similarity = (double)(100 * (strCommon.Count() * 2)) / (charString1.Length + charString2.Length);
   Console.WriteLine("Strings are {0}% similar", Similarity.ToString("0.00"));   
}
//Output:- Strings are 66.67% similar

Similarity in % between Arrays of String
public void SmilarityinPercentage()
{
  string[] string1 = new string[] {"Manish","Dubey", "Dot", "Net","World" };
  string[] string2 = new string[] { "Dot", "Net", "World" };
  var strCommon = string1.Intersect(string2);
  //Formula : Similarity (%) = 100 * (CommonItems * 2) / (Length of String1 + Length of String2)
  double Similarity = (double)(100 * (strCommon.Count() * 2)) / (string1.Length + string2.Length);
  Console.WriteLine("Strings are {0}% similar", Similarity.ToString("0.00"));
}
//Output:- Strings are 75.00% similar

Similarity in % between String Sentences
public void SmilarityinPercentage()
{
  string string1 = "My blog name is Dot Net World";
  string string2 = "Dot Net World";
  string[] splitString1 = string1.Split(' ');
  string[] splitString2 = string2.Split(' ');
  var strCommon = splitString1.Intersect(splitString2);
  //Formula : Similarity (%) = 100 * (CommonItems * 2) / (Length of String1 + Length of String2)
  double Similarity = (double)(100 * (strCommon.Count() * 2)) / (splitString1.Length + splitString2.Length);
   Console.WriteLine("Strings are {0}% similar", Similarity.ToString("0.00"));  
}
//Output:- Strings are 60.00% similar

Regards, Arivu :slight_smile:


#5

@Raviteja94
You can use the cognitive Service activities



#6

@Madhavi Can you briefly explain how I can use this for my solution.

Thanks
Raviteja