Splitting documents for training

What is the recommended split of documents for training and evaluation, considering a total of 15 documents per vendor?

A. 12 documents for training the model, and 3 for evaluating the model.

B. 8 documents for training the model, and 7 for evaluating the model.

C. 7 documents for training the model, and 8 for evaluating the model.

D. 10 documents for training the model, and 5 for evaluating the model.

@Latifa,

I think the Option A is the right answer here.

This split ensures that the model has enough data to learn effectively while still having a sufficient number of documents to evaluate its performance accurately.

There can’t be a exact number but just percentage. It’s a best practice to use 80% for training and remaining 20% for evaluation.

Refer this for more details.
Document Understanding - Training High Performing Models

1 Like

@ashokkarale thank you for confirming this to me

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.