AI Center Training Pipeline Strategy For Document Understanding Packages

If a model is trained on invoices for company A, B & C ,then later training the model on invoices for company D by probably creating a different dataset, does the learning take into account company D being added?

Does it supersede any previous logic made by A. B & C or Is there a need to further add a few documents of A,B & C when training company D?

Whenever there are some changes in the existing documents or there are new documents , do not directly start training the model.

  • First test it with existing model and if it gives good results, no need to change anything.If it gives decent results , then use current version and gather some more data during validation using validation station/ action center.
  • Else if result are not that great then add new documents(company D invoices) in older dataset(company A,B & C) and thereafter it will contain all documents ie A,B,C and D.
  • Then train the base model ie "v.0" on this dataset and "v.2" version of model will get generated ,which will give better results.

Here "v.0" is base model version.

It is always suggested to retrain on top of the base model with a bigger dataset rather than with smaller dataset (company D) on "v.1". Training on a bigger batch will make the model smarter as it will find more patterns and overall perform better.