Identifying Near Duplicate Strings

Hello all,

I have a scenario of identifying near duplicate strings. Let’s say same names “Sandeep Joice” & “Snadeep Joice” for example. These are not duplicate but near duplicate.

Any leads on how to identify?

Regards
Joice

Hi
Hope these steps would help you resolve this
—now use assign activity like this
List_strings = Split(“yourinputstring”,” “).ToList()
—now use a while loop and mention the condition like this
counter < List_strings.Count
Where counter is a int32 variable with default value as 0 defined in the variable panel
—inside the while loop use a assign activity like this
occurrence = 0
Where occurrence is a int32 variable used to count how many times the string has repeated

—then next to this inside the while loop use a For each Activity and pass the above variable List_strings as input where change the type argument as string in the property panel
—inside the for each loop use a IF condition like this
item.ToString.Equals(List_strings(counter).ToString)
If this condition is true it will go to THEN part where we can use a assign activity like this
occurrence = occurrence + 1
—now next to this FOR EACH loop inside the while loop use a WRITELINE activity and mention like this
“The word “ +List_strings(counter).ToString+” has occurred “+occurrence.ToString+” times.”
—followed by that use a assign activity to increment the counter value like this
counter = counter + 1

Hope this would help you
Kindly try this and let know for any queries or clarification
Cheers @enoondusandeepa

1 Like

Thanks for the headsup. I tried this and it returns the occurrence of the word. What I am trying to do is, check the similarity of the two nearly duplicate strings. Based on the result, planning to perform some activities.

Take a look at the custom activity from Roboyo, it calculates some of the more common Fuzzy matching algorithms. Read the license though as well.

1 Like