Training dataset in Document Understanding

Hello all.
What is the recommended number of documents per vendor to train the initial dataset in Document Understanding?
A. 5
B. 10
C. 15
D. 20
Thank you


There could not be any recommended number as per vendor also there could be variations. Just one thumb rule,

more the better

Refer this documentation for more details:

Ashok :slight_smile:

1 Like

It depends on the number of document types you have, if you have 4 different types of invoices then its less than if you have 20.

The DU center guides you on this and gives you suggestions, the new AI driven ‘modern’ method will also reduce the numbre of documents you need.
I’d say 50 is a min, 100 is good to start, but it also depends if you start from an existing model or a brand new one from scratch.

Its not a simple yes / no answer as there are too many variables.

1 Like

You need to train 30+ files for each template.


20 is recommended numner per vendor…

For more details check this


1 Like