When ever am running Training, Evaluation and Full pipeling which minor version should I select , and why can any one help me with it.
Hello @Sidharth_S_Kadri,
- Training & Full Pipelines: Select Minor Version 0.
- Why: This ensures you train on the base “clean” model. Selecting a higher version (like 1 or 2) can cause “Catastrophic Forgetting,” where the model loses its previous knowledge and its accuracy degrades.
- Evaluation Pipelines: Select the latest trained version (e.g., 1, 2, or 3).
- Why: You want to measure the performance of the model you just trained, not the empty base version.
Thanks,
Karthik
I understand the process, but I’m a bit confused about the dataset during retraining. When we retrain the model, should we include the old dataset along with the new data, or should we train using only the new dataset?
Hello @Sidharth_S_Kadri,
Why you must use a cumulative dataset
- Prevents Catastrophic Forgetting: If you train using only new data, the model will start to “forget” how to process the document types and fields it learned previously. This causes overall accuracy to drop significantly.
- Comprehensive Training: For a model to recognize both old and new labels accurately, it needs to see all of them in a single, well-rounded dataset during the training session.
- Model Stability: A larger, more representative dataset—combining your original high-quality labels with new data from the Validation Station—creates a more stable and high-performing model.
Thanks,
Karthik
