ML skill model

Aki1111 · May 3, 2023, 6:03am

Hi,

if we are processing 5000 documents how much of training data should be trained?

arjunshenoy · May 3, 2023, 6:05am

It is recommended to keep the training and evaluation data in a ratio of 80:20.

As per UiPath’s official documentation:

For Regular fields, you need at least 20-50 document samples per field. So, if you need to extract 10 regular fields, you need at least 200-500 document samples. If you need to extract 20 regular fields, you need at least 400-1000 document samples. The amount of document samples you need increases with the number of fields. More fields means you need more document samples, about 20-50X more.

For Column fields, you need at least 50-200 document samples per column field, so for 5 column fields, with clean and simple layouts you might get good results with 300 document samples. For highly complex and diverse layouts, it might require over 1000 document samples. To cover multiple languages, then you need at least 200-300 document samples per language, assuming they cover all the different fields. So, for 10 header fields and 4 column fields with 2 languages, 600 document samples might be enough (400 for the columns and headers, plus 200 for the additional language), but in some cases might require 1200 or more document samples.

Classification fields generally require at least 10-20 document samples from each class.

Hope this helps,
Best Regards.

Aki1111 · May 3, 2023, 10:46am

Thank you for letting me know.

system · May 6, 2023, 10:47am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hi guys, pls provide solution for this issue in Document understanding Studio studio , question , activities_panel	9	783	April 25, 2023
ML skill - field ID Activities document_understanding	10	777	May 19, 2023
HOW MANY MINIMUM SAMPLE DOCUMENTS WE SHOULD USE IN ORDER TO TRAIN ML SKILL AI Center question , ai_center	2	1330	March 8, 2022
I would like to raise a query regarding the Document Understanding process Activities activities , question , document_understanding	5	26	June 15, 2025
Training AI Model - Document Understanding AI Center question , document_understanding , ai_center	4	1430	September 1, 2022