Why should the minor version be chosen as 0 or base model of the version every time the ML Model is re-trained?
As per the recommendation and the documentation that whenever new documents are added to increase the dataset size, it is recommended to "Always add Validation Station data to same dataset and train on ML Package minor version 0 (zero)".
Read more on the Training High Performing Models
The reason behind this is fallacy of Deep Learning pre trained models, where iterative trainings will lead to eventual forgetting which is Catastrophic interference, also known as Catastrophic Forgetting and that is the reason, it is needed to always add all the data to the same dataset and train on the ML Package minor version 0 or the base model of the version.