I am working on a project where I have a sentence in a string variable and an excel file. Let’s say that the variable is equal to “Internet payment from bank transfer”. In the excel file there are different categories of transaction types. I have to find the most relevant category for this particular sentence.
For this I am using the activity named “Semantic Similarity”. I am attaching below an image to understand for what activity I am talking about.
As you can see I have all the categories from the excel to an array and I am comparing it to the sentence I get from the CurrentRow.Item(“Details”).ToString. Is there anyone that used this activity and got any idea how is working? I can’t find anything in the documentation.
Here is an explanation on How “Semantic Similarity” Activity works.
Let us first understand when is Semantic Similarity used.
It is used when you need to measure how similar two pieces of text are in meaning.
In this case, checking whether the string “Internet payment from bank transfer” has a String in the Array of Strings (Transaction Types) with the Same meaning.
To understand it better, let us use “String To String” as the Similarity Type
Here, are 3 cases of string to string: Note: The Outputs “Best match String” and “List of similarity score” are not available when the Similarity type is set to String to String. Case 1
Observation: As we can see the Explanation given by the Activity, the analogy is totally different in case 1 when compared to 2,3. When “Baseball” and “Cricket” are compared, it could have given high score because both are Sports and involve bats. But it did not come to that conclusion, it explains in detail as to why it is assigned a similarityScore of “0.3”. In Cases 2 and 3, you can clearly see how well it understands the meaning behind both words/sentences given.
Let us observe another case, where both strings are sentences
They are 2 different sports, but the similarity score is very high because GenAI feels that the small difference (player count) is not significant enough to differentiate them as 2 sports.
This raises the main question for your use case. How descriptive is your value in the variable CurrentRow.Item(“Details”).ToString and how different are each Transaction Type? Let me demonstrate this with Similarity Type Set to List of Strings and Output Type set to Best Match.
Case 1: If the String does not have a proper meaning (resulting in matching it with multiple strings in the array), then the Best Match String from the array will be different every time. This could also happen if multiple strings in the array have similar meaning among themselves. Key Take away: The Input String Must have proper meaning.
Case 2: If the String is well defined, and strings in array are distinct in meaning from each other, But the best suited match itself has contradictory words within itself, then you will get the same result every time but with a less Score (Reason for correct match: The Distinct meaning between each string of array. Reason for Low Score: the conflicting/contradictory words) In below example where I have run it twice, the score is low, but the answer is consistent, because the word “Deposit” in main string is a “Credit” transaction, which makes the answer obvious, but in the only match possible the word “External” and “Domestic” are conflicting: Key Take away: Every String (substring or category) in the Array Must have distinct meaning from each other, sufficient enough to have consistent answer for a particular input.
What happens when i fix this issues? The similarity score Increases, not much. But you can see what i did there. It may not really make sense in the Real-World. But i guess you can understand how it works.
Can it be increased further? Yes. But the input string has to be even more descriptive and the strings in the array must be very distinct
If you feel like this activity does not seem to work well with your use case, you can try the Categorize activity which is another GenAI activity, as it gives you the flexibility to describe each category.
I hope this explanation Helps. If so, do mark it as a Solution Thank You Happy Automation
first issue I see is in array of string you gave array.ToString which means you are comparing with system.array when you do array.tostring it gives the type rather than the values…you need to pass the array of strings with some delimiter …so you need to pass String.Join(",",array)…this will ensure output is comma separated string values of actual strings in array…
coming to how it works a very good explanation from @V_Roboto_V …so i don’t think I need to explain that
The reason why I done it like this is because when I remove the .ToString it gives me the below error. I don’t know why. I chose the similarity type as list of strings, but instead of list it gets a string variable. That’s why I was confused of its use.
Thank you for your explanation in your comment. I appreciate that a lot!
Let’s say an example of mine, and I am running it now.
In the first comparison field I insert the sentence "1Bank - Transfer-Internet-Debit Topup. In the second one in insert the array as you told me to correct. The sentence that I expect to be the best match is “ACCRUED INTERNET SERVICES”. But instead of this i get “BANK OVERDRAFT”, with score 0.4.
This is because both sentences are not clearly defined and don’t have a distinct meaning?
In my case I can’t use the activity “Categorize” because the array that I am giving as a second comparison input is not strict and is changing all the time.
If Semantic Similarity is giving you incorrect answers too many times. Then you can repurpose the Content Generation Activity to do the job of Semantic Similarity. You will get the flexibility to choose the model as well.
"You are a Categoriser that categorises the Transaction Detail to a Transaction type Based on Semantic Similarity." + "\n" +
"You will get the Transaction Detail and the list of Transaction Types as Input" + "\n" +
"Your Output must strictly be from the list of Transaction Types provided and not outside it. You are not allowed to give any explanation. Just the Output." + "\n" +
"For Example: Your Input is:" + "\n" +
"Transaction Detail: " + "Bank transfer through SWIFT" + "\n" +
"Transaction Types:" + "DOMESTIC CREDIT TRANSFER,INTERNATION SWIFT TRANSFER,INTERNAL COMPANY TRANSFER" + "\n" +
"You output will be:" + "\n" +
"INTERNATION SWIFT TRANSFER"
I Hope this helps. I know that the question maybe “Why use Content Generation? What is the point of Semantic Similarity then?” It is just that, Content Generation gives you more flexibility. You can even give strict rules in the System Prompt and get desired output everytime. There is Context Grounding, giving you even more control, Even if the Transaction Types change you can explain the most common types or the ones that are commonly confused, thereby giving better grounded results.
How can I use this activity, because I am a little lost. I tried to insert a phrase and created a variable for the top generated text, but as it seems it has divide the categories I provided to 2 other categories and answered generally without mentioning a specific category.
The prompt should be in the form of a sentence like writing to ChatGPT?
As you can see in the System Prompt i have given strict orders like “Your Output must strictly be from the list of Transaction Types provided and not outside it. You are not allowed to give any explanation. Just the Output.” These instructions are necessary to confine the model to answer the way we want.
Also, I have given one Example to the model, so that ithas no confusion on the Input and Output format
"You are a Categoriser that categorises the Transaction Detail to a Transaction type Based on Semantic Similarity." + "\n" +
"You will get the Transaction Detail and the list of Transaction Types as Input" + "\n" +
"Your Output must strictly be from the list of Transaction Types provided and not outside it. You are not allowed to give any explanation. Just the Output." + "\n" +
"For Example: Your Input is:" + "\n" +
"Transaction Detail: " + "Bank transfer through SWIFT" + "\n" +
"Transaction Types:" + "DOMESTIC CREDIT TRANSFER,INTERNATION SWIFT TRANSFER,INTERNAL COMPANY TRANSFER" + "\n" +
"You output will be:" + "\n" +
"INTERNATION SWIFT TRANSFER"
I am trying now this activity and I entered all the information as you showed me. Even the prompts are the same. I just changed the variables in order to get the correct values, but I don’t get yet the wanted result. I tried all the provided models.
I guess that 51 Types are too many for the model to pinpoint the correct Type.
Or
as I explained initially regarding Semantic Similarity, the Transaction Type need to be very distinct.
You can see by the explanation given by ChatGPT as well: