LightTextClassification ML Skill


I am trying to use the UiPath supported ML packages, specifically the LightTextClassification package as per the documentation here:

I used an example CSV full of basically a sentiment analysis with all the entries filtered into positive or negative and made a training pipeline.
Problem is when I deploy it as a skill I don’t get the responses as indicated in the documentation. Instead of a result like
“Positive”, “confidence”: 0.9422031841278076

I am getting

“class”: “0”,
“confidence”: 0.84942038749394477

Every single entry returns a ‘class’ as zero but the confidence does change.
If I use the ML package of EnglishTextClassification, train it on the same file then everything works correctly and I get a Json response with the prediction and the confidence.

Can someone tell me what I’m doing wrong or what is wrong in the documentation? Or is the Package incorrect?

I can’t help you with a solution, but I can confirm I’m having the exact same issue with the LightTextClassification skill. :slight_smile:

I should have followed this up.

The issue I encountered was related to inconsistent behaviour between similar ML skills. The EnglishTextClassification seems to be much better, it handles any CSV file (aswell as other formats) you provide in your dataset and allows you to set the input parameters for the ‘input’ and ‘target’ columns. It will fail and give you good details when it does fail

LightTextClassification on the other hand, well that needs a file in a very specific format. You have to use the exact column names AND exact file name in the instructions “dataset.csv”.
If you put in a different file it seems to still ‘train’ but then does the results as I explained above.

Please try again with the input file as I describe and I think you’ll have the same success as I did.

Partly my fault for not reading the instructions to such a fine detail, partly the models fault for having poor error handling I think.

Hi Jon,

Thanks for the follow-up!

I can’t use the EnglishTextClassification model as my input can be in multiple language. I’d love to use the MultiLingualTextClassification, but I get errors when I try to deploy that one.

So what you’re saying is that my csv file to train should be called “dataset.csv” and not “train.csv”?
And it should have 2 columns, input & target?

Same reason I was looking to use the other skills, non English data aswell.

Yes, try changing your filename to that and make sure the columns are input and target.
Its in the documentation page but as I mentioned, the other skills don’t have these hard requirements so its confusing when trying out different ones and getting these inconsistent results.

Does the LightTextClassification model support multiple languages? Because the documentation is also quite vague on this…

It supports all languages based on Latin characters, such as English, French, Spanish, and others.

Do you happen to have any experience with the MultiLingual model?

I’ve only done a basic proof of concept on it, I need alot more data from my customer to stress test the model so I’d be interested in hearing your results with non English languages aswell.

If I have enough licences I might try doing a comparison between the English one, LightText and the Text one (I think there are 3) to see if I get wildly different results with this language.

Perhaps if one of us can get our hands on a ‘sentiment analysis’ dataset in a none English language we could see. I have one in English that maybe could be translated… its 1000 entries however so something to be done in batch and not by hand which relies on Google translate being accurate.

Could you share some information about how you trained & deployed the MultiLingual model?

I’m having issues once deployed, as shown in this forum post.

I’ve been waiting for UiPath Support to help but they are slow in responding.

Hi Jon
That’s weird, would you mind sharing your dataset?

@Senne_Symons Ah, I realize I wasn’t clear Senne, I meant I only did a POC with the LightTextClassificationSkills, I have also tested other unrelated skills but I haven’t done the multi language ones.

Are you referring to the original issue I posted with the LightTextClassification? As I explained, it does this behaviour if you don’t name the file name dataset.csv.
Here is a file I was testing with that is a basic sentiment analysis of some text, if you name the file whatever.csv you should get the same results as I reported in my initial post (no errors) if you name it dataset.csv then it works.
Processing: dataset2.csv… (20.1 KB)