Training dataset in Document Understanding

What is the recommended number of documents per vendor to train the initial dataset in Document Understanding?
There could not be any recommended number as per vendor also there could be variations. Just one thumb rule,

more the better

Refer this documentation for more details:

Ashok :slight_smile:

It depends on the number of document types you have, if you have 4 different types of invoices then its less than if you have 20.

The DU center guides you on this and gives you suggestions, the new AI driven ‘modern’ method will also reduce the numbre of documents you need.
I’d say 50 is a min, 100 is good to start, but it also depends if you start from an existing model or a brand new one from scratch.

Its not a simple yes / no answer as there are too many variables.

You need to train 30+ files for each template.


20 is recommended numner per vendor…

For more details check this


