Evaluate Set and Validate Set in Data Labeling

Hello everyone,

In the Academy courses, we also label evaluate sets within Data Labeling. However, I don’t fully understand what the evaluate set is used for and why we are labeling it. There are both validate sets and evaluate sets, and I am a bit inexperienced in this area. I would appreciate it if you could help me understand these concepts better.

Hi @tuncay.caglak

There are two types of sets in AI Center as you said.

  1. Evaluation set
  2. Training set

When we are labellling the documents in the document manager,
→ we will label 20% of documents for Evaluation.
→ we will label remaining 80% of documents for Traning.

Difference,
→ The Traning set is used to train your ML Skill.
→ When we are running the pipeline the evaluation set evaluate the extracted data with the evaluated set data. This is compare in the backened.

Hope it helps!!

1 Like

First of all, thank you for your response.

As far as I know, when we upload the data, the system automatically splits it into 20% validation set and 80% training set. However, an evaluation set is added afterward. What I really want to understand is the difference between the validation set and the evaluation set.

Additionally, as I understand, when I select “Full Pipeline” in the pipeline step, the evaluate set tests the trained model with its data. If this information is correct, the question arises: the trained model should be tested with raw data, but we are processing the evaluation set during the data labeling step. So, on what basis does the model evaluate itself?

I would appreciate it if you could also help with these issues.

image

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.