To run a full pipeline , how do we build data for evaluation pipeline?
Does it need to different from training data & how many min number of documents are supposed to be added in evaluation?
Hi @Ana_Patricia ,
Could you let us know what kind of a Model/ ML Pacakge are you trying to Deploy / Evaluate ?
We do follow the 80/20 Rule for Training and Evaluating a Pipeline. From the whole dataset that we have received, we keep 80 % of it for Training and 20 % of it for Evaluation. However, You could keep a larger percentage of data as well for evaluation.
Created a generic ML model for customer billing invoice . so you mean we dont need to import any separate set of data for evaluation manually will the pipeline take from the dataset imported for labelling ?
If not if I labelled 50 documents then should I create an other data set to evaluate with different set of documents other than that 50 labelled docs?
And what is the ideal Epohs value for this type
If Data Labelling was used, we will be able to mark the Data / batch set that we want as the Evaluation Dataset. Hence, Ignoring these data when the export folder is used for Training Pipeline.
Check the below Docs for more info :
so the batch which we mark as evaluation set should contain documents different from batch which we use as input for training?
Yes. That would be the ideal case. However, you could use also the same documents. Generally, a different set needs to be taken to understand the Model Accuracy.