Matching a string based on similarity i.e. not exact match

string

#1

Hi guys,

I have a question regarding partial match of two strings.

I have a string and I need to check it. To be more specific, I have output from OCR and it contains some mistakes. I need to check if the string is really there but as it can be written incorrectly I need only 70% match.

Is it possible to do that in UiPath?

I read this topic Fuzzy Matching in string comparison but it was not helpful.


#2

Hi,

I don’t know if this is a really good answer, but if the wrong characters are consistently in a few places, you might be able to do a Regex pattern with an OR to have multiple patterns.

using “STRING” as your string:

System.Text.RegularExpressions.Regex.IsMatch( text, "(.*)((S[A-Z0-9]R[A-Z0-9]NG)|(5[A-Z0-9]R[A-Z0-9]NG))(.*)" )

“|” is for OR. IsMatch will return True or False or you can use .Match( ).Value to bring in the string value.

:man_shrugging: maybe something like that will work.


#3

Looks like it can be accomplished by levenshtein distance algorithm. Try attached code and see if it helps

LevenshteinAlgorithm.xaml (9.1 KB)


#4

You can try stringVariable.Contains("") and see if it works for you


#5

Hi , can you please paste the code invoked in the workflow? I am unable to see it as I have lower version of UiPath.
Thanks!
Could not find type ‘InvokeCode’ in namespace ‘http://schemas.uipath.com/workflow/activities’. Row: 95, Column: 8
image


#6

s= original string argument
t = compare string argument
r= result int argument

 Dim n As Integer = s.Length
            Dim m As Integer = t.Length
            Dim d(n + 1, m + 1) As Integer

            If n = 0 Then
                r= m
            End If

            If m = 0 Then
                r= n
            End If

            Dim i As Integer
            Dim j As Integer

            For i = 0 To n
                d(i, 0) = i
            Next

            For j = 0 To m
                d(0, j) = j
            Next

            For i = 1 To n
                For j = 1 To m

                    Dim cost As Integer
                    If t(j - 1) = s(i - 1) Then
                        cost = 0
                    Else
                        cost = 1
                    End If

                    d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1),
                                       d(i - 1, j - 1) + cost)
                Next
            Next

            r= d(n, m)

#7

Same error in my case. It would be really helpful if you can show us the part which is missing


#8

This will work in 2017.1 and up versions. If you are using 2016.2 you will get the activity error.


#9

How can I update?


#10

Orchestrator

https://platform.uipath.com


#11

Its working and I am just looking at the algorithm. So it will show how many words are different somehow.
Is it possible to adjust the code in order to have the percentage of mistakes on the whole string???

Imagine a situation that I have only 4 words and all of them contain one mistake. But it can be still 90% match.


#12

Since the word count is being calculated for both string, i think you can calculate the percentage once the result is generated (outside code activity). I did not understand the 90% match part.

((wordcount - result)/wordcount)*100


#13

I was talking about characters, not the whole words.

For example in your algorithm if I have one word: “Word” and the second one would be “Wor” then it will show that I have 100% per cent mistakes. But from my point of view I have 75% match.


#14

You could use a loop to compare the characters or preferably LINQ

Basically you go through each character and add 1 if they match then do sum/total to find percentage

If I come up with a LINQ solution, I’ll post it.


#15

Something like this?

Except instead of Intersect


#16

Modified a bit (letter comparison).

LevenshteinAlgorithm.xaml (15.4 KB)


#17

Wau…looks great!!
one more question. If I have an invoice and I need to check that it contains one number. But this number can be anywhere and there can be of course other numbers…How would you check it?
Imagine an invoice with text:
bla blabla blabla blabla 23.11.2017blabla blabla blabla blabla blabla blabla blabla blabla blabla bla bla blabla bla 123406 bla blabla blabla blabla bla

And I need to check if it contains 123456, or if it contains at least a match of 70%. Is it doable?


#18

Thanks vvaidya! although, I couldn’t get the Except and Intersect to work, but I probably did something wrong.

Here is an alternative that I came up with:

(str1.Where(Function(s) str2.Contains(s) ).Count / str2.Count)*100

That will return the percentage of characters that are contained in your main string str2. So you can use that in an If condition.

Regards, @mario


#19

Nevermind, @ClaytonM already replied! :sweat_smile: