Semantic Similarity studio activity

Hello there.

I am working on a project where I have a sentence in a string variable and an excel file. Let’s say that the variable is equal to “Internet payment from bank transfer”. In the excel file there are different categories of transaction types. I have to find the most relevant category for this particular sentence.

For this I am using the activity named “Semantic Similarity”. I am attaching below an image to understand for what activity I am talking about.

As you can see I have all the categories from the excel to an array and I am comparing it to the sentence I get from the CurrentRow.Item(“Details”).ToString. Is there anyone that used this activity and got any idea how is working? I can’t find anything in the documentation.

Could you please help me?
Thank you in advance!

1 Like

Hi @aikaterini.karakasidi

Here is an explanation on How “Semantic Similarity” Activity works.

Let us first understand when is Semantic Similarity used.
It is used when you need to measure how similar two pieces of text are in meaning.
In this case, checking whether the string “Internet payment from bank transfer” has a String in the Array of Strings (Transaction Types) with the Same meaning.

To understand it better, let us use “String To String” as the Similarity Type
Here, are 3 cases of string to string:
Note: The Outputs “Best match String” and “List of similarity score” are not available when the Similarity type is set to String to String.
Case 1


Case 2


Case 3:


Observation: As we can see the Explanation given by the Activity, the analogy is totally different in case 1 when compared to 2,3. When “Baseball” and “Cricket” are compared, it could have given high score because both are Sports and involve bats. But it did not come to that conclusion, it explains in detail as to why it is assigned a similarityScore of “0.3”. In Cases 2 and 3, you can clearly see how well it understands the meaning behind both words/sentences given.

Let us observe another case, where both strings are sentences


They are 2 different sports, but the similarity score is very high because GenAI feels that the small difference (player count) is not significant enough to differentiate them as 2 sports.

This raises the main question for your use case. How descriptive is your value in the variable CurrentRow.Item(“Details”).ToString and how different are each Transaction Type? Let me demonstrate this with Similarity Type Set to List of Strings and Output Type set to Best Match.

Case 1: If the String does not have a proper meaning (resulting in matching it with multiple strings in the array), then the Best Match String from the array will be different every time. This could also happen if multiple strings in the array have similar meaning among themselves. Key Take away: The Input String Must have proper meaning.


Case 2: If the String is well defined, and strings in array are distinct in meaning from each other, But the best suited match itself has contradictory words within itself, then you will get the same result every time but with a less Score (Reason for correct match: The Distinct meaning between each string of array. Reason for Low Score: the conflicting/contradictory words) In below example where I have run it twice, the score is low, but the answer is consistent, because the word “Deposit” in main string is a “Credit” transaction, which makes the answer obvious, but in the only match possible the word “External” and “Domestic” are conflicting: Key Take away: Every String (substring or category) in the Array Must have distinct meaning from each other, sufficient enough to have consistent answer for a particular input.


What happens when i fix this issues? The similarity score Increases, not much. But you can see what i did there. It may not really make sense in the Real-World. But i guess you can understand how it works.



Can it be increased further? Yes. But the input string has to be even more descriptive and the strings in the array must be very distinct

If you feel like this activity does not seem to work well with your use case, you can try the Categorize activity which is another GenAI activity, as it gives you the flexibility to describe each category.

I hope this explanation Helps. If so, do mark it as a Solution
Thank You
Happy Automation :star_struck:

4 Likes

@aikaterini.karakasidi

first issue I see is in array of string you gave array.ToString which means you are comparing with system.array when you do array.tostring it gives the type rather than the values…you need to pass the array of strings with some delimiter …so you need to pass String.Join(",",array)…this will ensure output is comma separated string values of actual strings in array…

coming to how it works a very good explanation from @V_Roboto_V …so i don’t think I need to explain that

cheers

1 Like

The reason why I done it like this is because when I remove the .ToString it gives me the below error. I don’t know why. I chose the similarity type as list of strings, but instead of list it gets a string variable. That’s why I was confused of its use.

Is there anything I can do to manage this error?


Hi @aikaterini.karakasidi

You have to convert the Array of String to a single string with “comma” as the delimiter.

When you give Array directly, it gives error because it expects a String. Which is correct as you said.

But, Array.ToString does not convert it to a String in the format that the Activity expects.

You must do String.Join(“,”, Array)

You can see the immediate panel below, and get an understanding:

1 Like

Hey @aikaterini.karakasidi

Is your question answered? If you have further queries, please ask. I’ll gladly Help.

If your issue is resolved, Do mark the solution that helped and close the thread.
Thank You,
Happy Automation :star_struck:

Thank you for your explanation in your comment. I appreciate that a lot!

Let’s say an example of mine, and I am running it now.
In the first comparison field I insert the sentence "1Bank - Transfer-Internet-Debit Topup. In the second one in insert the array as you told me to correct. The sentence that I expect to be the best match is “ACCRUED INTERNET SERVICES”. But instead of this i get “BANK OVERDRAFT”, with score 0.4.

This is because both sentences are not clearly defined and don’t have a distinct meaning?

In my case I can’t use the activity “Categorize” because the array that I am giving as a second comparison input is not strict and is changing all the time.

Could you check the Explanation given by the Activity, We will get an Idea on it’s analysis


As you can see in the last sentence, the reason for confusion:
image

How many transaction types do you have in that array?

Yes I can see the reason now!!

I have 51 and it won’t always be 51 because they are changing.

Hey @aikaterini.karakasidi

When i ask the same thing with ChatGPT directly. It will categorise it correctly

If Semantic Similarity is giving you incorrect answers too many times. Then you can repurpose the Content Generation Activity to do the job of Semantic Similarity. You will get the flexibility to choose the model as well.

Hi @aikaterini.karakasidi

Here, I have repurposed a Content generation Activity and gives the correct results. You can try it and let me know.

Assign Activity before the Content Generation

transactionDetail = "1Bank - Transfer-Internet-Debit Topup"
stringCategories = "ACCRUED INTERNET SERVICES,BANK OVERDRAFT"

The Content Generation Activity:

Prompt:

"Transaction Detail: " + transactionDetail + "\n" +
"Transaction Types:" + stringCategories

System Prompt:

"You are a Categoriser that categorises the Transaction Detail to a Transaction type Based on Semantic Similarity." + "\n" +
"You will get the Transaction Detail and the list of Transaction Types as Input" + "\n" + 
"Your Output must strictly be from the list of Transaction Types provided and not outside it. You are not allowed to give any explanation. Just the Output." + "\n" + 
"For Example: Your Input is:" + "\n" +
"Transaction Detail: " + "Bank transfer through SWIFT" + "\n" +
"Transaction Types:" + "DOMESTIC CREDIT  TRANSFER,INTERNATION SWIFT TRANSFER,INTERNAL COMPANY TRANSFER" + "\n" +
"You output will be:" + "\n" +
"INTERNATION SWIFT TRANSFER"

Output Comparison:
bestMatch - Semantic Similarity, topResult - Content Generation
image

I Hope this helps. I know that the question maybe “Why use Content Generation? What is the point of Semantic Similarity then?” It is just that, Content Generation gives you more flexibility. You can even give strict rules in the System Prompt and get desired output everytime. There is Context Grounding, giving you even more control, Even if the Transaction Types change you can explain the most common types or the ones that are commonly confused, thereby giving better grounded results.

I hope this helps

1 Like

How can I use this activity, because I am a little lost. I tried to insert a phrase and created a variable for the top generated text, but as it seems it has divide the categories I provided to 2 other categories and answered generally without mentioning a specific category.

The prompt should be in the form of a sentence like writing to ChatGPT?

Yes. Content Generation Activity is just like talking to ChatGPT.
You can try using the prompts i have given and experiment on it.

The System Prompt and Prompt need to be very well formatted, and the instructions need to be clear. You will definitely get good results with it.

1 Like

As you can see in the System Prompt i have given strict orders like “Your Output must strictly be from the list of Transaction Types provided and not outside it. You are not allowed to give any explanation. Just the Output.” These instructions are necessary to confine the model to answer the way we want.

Also, I have given one Example to the model, so that ithas no confusion on the Input and Output format

"You are a Categoriser that categorises the Transaction Detail to a Transaction type Based on Semantic Similarity." + "\n" +
"You will get the Transaction Detail and the list of Transaction Types as Input" + "\n" + 
"Your Output must strictly be from the list of Transaction Types provided and not outside it. You are not allowed to give any explanation. Just the Output." + "\n" + 
"For Example: Your Input is:" + "\n" +
"Transaction Detail: " + "Bank transfer through SWIFT" + "\n" +
"Transaction Types:" + "DOMESTIC CREDIT  TRANSFER,INTERNATION SWIFT TRANSFER,INTERNAL COMPANY TRANSFER" + "\n" +
"You output will be:" + "\n" +
"INTERNATION SWIFT TRANSFER"

Your “topResult” variable in which property field did you apply it? In the “Top generated text”?

1 Like

Yes.

You can experiment with properties if you like. But not necessary for your use case. Here is what various properties do (from UiPath documentation:)

Thank you!!

I am trying now this activity and I entered all the information as you showed me. Even the prompts are the same. I just changed the variables in order to get the correct values, but I don’t get yet the wanted result. I tried all the provided models.

1 Like

I guess that 51 Types are too many for the model to pinpoint the correct Type.
Or
as I explained initially regarding Semantic Similarity, the Transaction Type need to be very distinct.
You can see by the explanation given by ChatGPT as well:

You can try another way (Time Consuming to develop), Train a Model and load it as an ML Skill.