This split ensures that the model has enough data to learn effectively while still having a sufficient number of documents to evaluate its performance accurately.
There can’t be a exact number but just percentage. It’s a best practice to use 80% for training and remaining 20% for evaluation.