Pipeline Train run does not improve ML model

DannyVelkov · February 24, 2023, 1:58pm

Hello,
I have a simple problem with one of the extracted row fields for one of the invoices of the template im training to generate a model for. The column value is sometimes just missing for every other row and I retried training the model through a pipeline manually with those 5-6 documents that have errors. I have added them through the Data Labeling and labelled them accordingly but after the train run, the same problem occured for those same 5-6 invoices. Any ideas what could cause this bug/issue?

supermanPunch · February 24, 2023, 3:35pm

Hi @DannyVelkov ,

Could you let us know some more details of the dataset size used for the initial training and how much of it had the type of data/template that had errors as you have mentioned ?

Also, for the next Training/Re-training have you Selected the initial Package version or the latest trained one ?

DannyVelkov · February 27, 2023, 1:13am

Well the dataset is around 500 mb and the invoices which are a problem are currently around 50.
I trained the latest version two times. Still the same results

sharon.palawandram · February 28, 2023, 9:47pm

from my understanding you’ve only trained 50 invoices? What’s the number of fields you are extracting?

You can go to your document manager and check if you have labelled sufficient data for all your fields, if you have labelled correctly, the indicators should be green, not red as seen below.

Note -This is similar if you’re using invoices model, your number of invoices to be trained will be a smaller value, but you should still make sure it’s in the green range.

Next, you should run an evaluation pipeline to check if all your fields have trained and extracted properly. Once you do these steps, it will be a clear indicator to know if your model is properly trained or not.

DannyVelkov · March 1, 2023, 9:33am

They are the same fields (around 9 and a table), spanning different layouts for the templates.

sharon.palawandram · March 1, 2023, 9:39pm

how many different layouts do you have? You might have to train around 3-5 samples per layout.

Also please run an evaluation pipeline. the confidence scores and metrices will really help you understand your model performance.
Your evaluation dataset should be a small size around 20% of the size of your train dataset, but it should have all variations and should be unseen samples (ones you have not labelled in training)

DannyVelkov · March 2, 2023, 8:43am

There are 3 layouts. The newest one is the problem. How can I select a percentage of the invoices to do an evaluation run?

Dr_Anand_Upadhyay · April 10, 2024, 11:23am

@DannyVelkov ,

Here, I want to know what kind of problem you are getting? whether your pipeline failed or you are not getting proper accuracy?

Thanks

Topic		Replies	Views
Invoice Model Retraining Issue - Pipeline failed due to ML Package Issue AI Center studio , document_understanding	4	2868	June 15, 2021
Trainable ML model for invoice extraction - Pipeline failed AI Center question , ai_center	5	2419	May 5, 2021
Retrained out-of-box model Document Understanding Error AI Center bug , ai_center	10	1105	March 10, 2023
Strange error in AI Center when uploading validated document data Studio ai_center	4	878	May 10, 2022
Document Understanding Data Labelling issue Document Understanding	11	85	March 18, 2025

Most Active Users - Yesterday
sharazkm32
singh_sumit
ashokkarale
lrtetala
prashant1603765
sonaliaggarwal47
Justin_Tan_Jun_Song_EE
Anil_G
mively
shrikrushna.bhoi
More details...

Pipeline Train run does not improve ML model

Related topics