Comparing two strings and display the percentage & highlight word matches

n3if · February 14, 2024, 5:10am

I am trying to develop a way to do content similarity detection by comparing one string against an exacting 100+ Excel cells from a database. After doing the stings comparison, I want to display the percentage % and the matching words.

I tried the Loewenstein algorithm. However, after multiple trials, the Loewenstein did not return accurate results, as the Loewenstein measured the metric difference between the two strings and did not read actual words.

What other way can I compare two strings words-wise, then identify percentage similarity and highlight slimier words in two strings, like how plagiarism checkers work.
For example:

The project will provide a platform and software solution to assist in planning & scheduling
VS.
The solution will offer a software platform to enhance planning & scheduling

Results should be around: 65%
And the highlighted words: (The, will, software, platform , to, planning, &, scheduling)

@ClaytonM , @lakshman , @Palaniyappan

mkankatala · February 15, 2024, 4:23am

Hi @n3if

Instead of using the Loewenstein algorithm, Use the Jaro Winkler Algorithm it was giving the best results as compared with the other algorithms.

Check the below image for better understanding,

Hope it helps!!

Vinit_Mhatre · February 15, 2024, 4:24am

Hi @n3if ,

I dont know it will work for you or not, but you can try with Levenshtein Distance Algorithm

UiPath Studio compare strings using Levenshtein Distance Algorithm | VB.net code in description - Learn / Video Tutorials - UiPath Community Forum

Regards,
Vinit Mhatre

AJ_Ask · February 15, 2024, 6:44am

Hi @n3if

I also faced the same issue with loewenstein algorithm, what works for me is this python script which compare the similarity between two texts based on their word frequencies. It calculates the cosine similarity between these vectors

import math
import re
from collections import Counter

WORD = re.compile(r"\w+")


def get_cosine(vec1, vec2):
    intersection = set(vec1.keys()) & set(vec2.keys())
    numerator = sum([vec1[x] * vec2[x] for x in intersection])

    sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])
    sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])
    denominator = math.sqrt(sum1) * math.sqrt(sum2)

    if not denominator:
        return 0.0
    else:
        return float(numerator) / denominator


def text_to_vector(text):
    words = WORD.findall(text)
    return Counter(words)


def get_text(text1,text2):
    vector1 = text_to_vector(text1)
    vector2 = text_to_vector(text2)
    cosine = get_cosine(vector1, vector2)
    print(cosine)
    return(cosine)

String_Compare.zip (473 Bytes)

Hope this helps

Topic		Replies	Views
Find string partial simmilarity percentage Studio studio , question , activities_panel	6	993	February 22, 2023
Compering two strings and display the percentage match StudioX studiox , question	4	149	February 13, 2024
Fuzzy String Comparison Help	7	5004	November 28, 2018
Compare Strings Return true if 80%of Likely match Studio activities , string , question	11	9012	February 26, 2024
Address Match Percentage Studio	13	2008	April 11, 2020

Most Active Users - Yesterday
Yoichi
Gautham_Pattabiraman
Anil_G
lrtetala
ashokkarale
Angel_Meseguer_piqueras
FINNNNNNNN
kardelencihangir
ayumi.ouchi
Gabriele_Radici
More details...

Comparing two strings and display the percentage & highlight word matches

Related Topics