How to create Evaluation Pipeline?

To run a full pipeline , how do we build data for evaluation pipeline?
Does it need to different from training data & how many min number of documents are supposed to be added in evaluation?

Hi @Ana_Patricia ,

Could you let us know what kind of a Model/ ML Pacakge are you trying to Deploy / Evaluate ?

We do follow the 80/20 Rule for Training and Evaluating a Pipeline. From the whole dataset that we have received, we keep 80 % of it for Training and 20 % of it for Evaluation. However, You could keep a larger percentage of data as well for evaluation.

Created a generic ML model for customer billing invoice . so you mean we dont need to import any separate set of data for evaluation manually will the pipeline take from the dataset imported for labelling ?
If not if I labelled 50 documents then should I create an other data set to evaluate with different set of documents other than that 50 labelled docs?
And what is the ideal Epohs value for this type

@Ana_Patricia ,

If Data Labelling was used, we will be able to mark the Data / batch set that we want as the Evaluation Dataset. Hence, Ignoring these data when the export folder is used for Training Pipeline.
Check the below Docs for more info :

so the batch which we mark as evaluation set should contain documents different from batch which we use as input for training?

@Ana_Patricia ,

Yes. That would be the ideal case. However, you could use also the same documents. Generally, a different set needs to be taken to understand the Model Accuracy.