ML skill model


if we are processing 5000 documents how much of training data should be trained?

Hi @Aki1111

It is recommended to keep the training and evaluation data in a ratio of 80:20.

As per UiPath’s official documentation:

  1. Regular fields (date, total amount)
  • For Regular fields, you need at least 20-50 document samples per field. So, if you need to extract 10 regular fields, you need at least 200-500 document samples. If you need to extract 20 regular fields, you need at least 400-1000 document samples. The amount of document samples you need increases with the number of fields. More fields means you need more document samples, about 20-50X more.
  1. Column fields (item unit price, item quantity)
  • For Column fields, you need at least 50-200 document samples per column field, so for 5 column fields, with clean and simple layouts you might get good results with 300 document samples. For highly complex and diverse layouts, it might require over 1000 document samples. To cover multiple languages, then you need at least 200-300 document samples per language, assuming they cover all the different fields. So, for 10 header fields and 4 column fields with 2 languages, 600 document samples might be enough (400 for the columns and headers, plus 200 for the additional language), but in some cases might require 1200 or more document samples.
  1. Classification fields (currency)
  • Classification fields generally require at least 10-20 document samples from each class.

Hope this helps,
Best Regards.

Thank you for letting me know.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.