If a model is trained on invoices for companies A, B, and C, and later, it is trained on invoices for company D using a different dataset, does the learning take into account company D being added? Does it override any previous logic made by A, B, and C, or is there a need to include a few documents from A, B, and C when training with company D's data?
Whenever there are changes to existing documents or new documents are introduced, do not immediately start training the model. Follow these steps:
-
Test with the Existing Model:
- Before making any changes, test the model with the new documents (company D invoices) using the current model.
- If the results are good, there's no need for further training.
- If the results are decent but not optimal, use the current version and gather more data during validation using a validation station or action center.
-
Evaluate the Results:
- If the results are not satisfactory, add the new documents (company D invoices) to the older dataset (company A, B, and C invoices). This combined dataset will now include all documents from A, B, C, and D.
-
Retrain the Base Model:
- Train the base model (v.0) on this new, larger dataset. This will generate a new version of the model (v.2) which is expected to perform better.
Key Considerations:
- "v.0" refers to the base model version.
- It is always recommended to retrain the model on a larger dataset (including companies A, B, C, and D). rather than just a smaller dataset (only company D) on an intermediate version (v.1).
- Training on a larger dataset helps the model identify more patterns, making it smarter and improving overall performance.