Validation Document vs Training Document

Hello,

I am currently retraining the OOTB Invoice model, and have noticed when I import documents for training, 20% of those documents are being tagged as Validation Documents (as opposed to Training Documents). In the UiPath Documentation, I can not find any details regarding what these Validation documents are used for and thus how I should approach their labeling. There is a high level note that these documents are used in the Training Pipeline, which makes sense given that I did not designate them as Evaluation Documents.

Can anyone provide some clarity as to what these Validation Documents are used for within the training pipeline, and how that differs from the rest of the Training Documents?

Thank you in advance.

Hi @frankie.amendola, welcome to the Community.

Validation documents are a subset of the training dataset that are used to assess the performance of a ML model during training. These documents are not used to update the model’s parameters but instead are used to evaluate its performance metrics, such as accuracy or F1 score, at different stages of the training process.

Typically, a portion of the training dataset is held out as a validation dataset, and the remaining documents are used to update the model parameters. The model is trained on the training dataset, and after each training iteration, it is evaluated on the validation dataset to determine whether it is overfitting or underfitting the training data.

If the model is overfitting, it will perform well on the training dataset but poorly on the validation dataset, and the training process is repeated. If the model is underfitting, it will perform poorly on both the training and validation datasets, and the model architecture may need to be modified.

The following resources with more information on these:

  1. Training-Evaluation dataset & balanced datasets section of this doc: https://docs.uipath.com/document-understanding/automation-cloud/latest/user-guide/training-high-performing-models

  2. https://docs.uipath.com/document-understanding/standalone/2020.10/user-guide/training-and-evaluation-pipelines

Hope this helps,
Best Regards.

This is very helpful, thank you Arjun. Quick follow up question: I still would not be able to perform a full pipeline run with just training and validation documents, correct? I imagine I would still need a separate set of documents that are designated as Eval documents in order to perform the evaluation portion of the full pipeline run.

1 Like

@frankie.amendola

You are correct. In order to perform a full pipeline run and obtain accurate evaluation metrics, you would typically need a separate set of documents designated as evaluation (test) documents. This set would be distinct from the training and validation sets, and would be used to evaluate the performance of the pipeline on previously unseen data.

Best Regards.