I am trying to use the UiPath supported ML packages, specifically the LightTextClassification package as per the documentation here:
I used an example CSV full of basically a sentiment analysis with all the entries filtered into positive or negative and made a training pipeline.
Problem is when I deploy it as a skill I don’t get the responses as indicated in the documentation. Instead of a result like
{
“prediction”:
“Positive”, “confidence”: 0.9422031841278076
}
Every single entry returns a ‘class’ as zero but the confidence does change.
If I use the ML package of EnglishTextClassification, train it on the same file then everything works correctly and I get a Json response with the prediction and the confidence.
Can someone tell me what I’m doing wrong or what is wrong in the documentation? Or is the Package incorrect?
The issue I encountered was related to inconsistent behaviour between similar ML skills. The EnglishTextClassification seems to be much better, it handles any CSV file (aswell as other formats) you provide in your dataset and allows you to set the input parameters for the ‘input’ and ‘target’ columns. It will fail and give you good details when it does fail
LightTextClassification on the other hand, well that needs a file in a very specific format. You have to use the exact column names AND exact file name in the instructions “dataset.csv”.
If you put in a different file it seems to still ‘train’ but then does the results as I explained above.
Please try again with the input file as I describe and I think you’ll have the same success as I did.
Partly my fault for not reading the instructions to such a fine detail, partly the models fault for having poor error handling I think.
I can’t use the EnglishTextClassification model as my input can be in multiple language. I’d love to use the MultiLingualTextClassification, but I get errors when I try to deploy that one.
So what you’re saying is that my csv file to train should be called “dataset.csv” and not “train.csv”?
And it should have 2 columns, input & target?
Same reason I was looking to use the other skills, non English data aswell.
Yes, try changing your filename to that and make sure the columns are input and target.
Its in the documentation page but as I mentioned, the other skills don’t have these hard requirements so its confusing when trying out different ones and getting these inconsistent results.
I’ve only done a basic proof of concept on it, I need alot more data from my customer to stress test the model so I’d be interested in hearing your results with non English languages aswell.
If I have enough licences I might try doing a comparison between the English one, LightText and the Text one (I think there are 3) to see if I get wildly different results with this language.
Perhaps if one of us can get our hands on a ‘sentiment analysis’ dataset in a none English language we could see. I have one in English that maybe could be translated… its 1000 entries however so something to be done in batch and not by hand which relies on Google translate being accurate.
@Senne_Symons Ah, I realize I wasn’t clear Senne, I meant I only did a POC with the LightTextClassificationSkills, I have also tested other unrelated skills but I haven’t done the multi language ones.
@Jeremy_Tederry
Are you referring to the original issue I posted with the LightTextClassification? As I explained, it does this behaviour if you don’t name the file name dataset.csv.
Here is a file I was testing with that is a basic sentiment analysis of some text, if you name the file whatever.csv you should get the same results as I reported in my initial post (no errors) if you name it dataset.csv then it works. Processing: dataset2.csv… dataset2.zip (20.1 KB)