In the Academy courses, we also label evaluate sets within Data Labeling. However, I don’t fully understand what the evaluate set is used for and why we are labeling it. There are both validate sets and evaluate sets, and I am a bit inexperienced in this area. I would appreciate it if you could help me understand these concepts better.
There are two types of sets in AI Center as you said.
Evaluation set
Training set
When we are labellling the documents in the document manager,
→ we will label 20% of documents for Evaluation.
→ we will label remaining 80% of documents for Traning.
Difference,
→ The Traning set is used to train your ML Skill.
→ When we are running the pipeline the evaluation set evaluate the extracted data with the evaluated set data. This is compare in the backened.
As far as I know, when we upload the data, the system automatically splits it into 20% validation set and 80% training set. However, an evaluation set is added afterward. What I really want to understand is the difference between the validation set and the evaluation set.
Additionally, as I understand, when I select “Full Pipeline” in the pipeline step, the evaluate set tests the trained model with its data. If this information is correct, the question arises: the trained model should be tested with raw data, but we are processing the evaluation set during the data labeling step. So, on what basis does the model evaluate itself?
I would appreciate it if you could also help with these issues.