Classification, ML Skills doubt

Hi. This is a difficult question to explain. Is about Document Undestanding
I have a project where there are different kind of pdf files.
I was thinking on the taxonomy define both types of documents.
Later on, use the classification step to determine which type of document belongs.

But then I want to use MLSkill.
Can I use the same MLSkill for both type of documents?
If so, will all the two types of documents be on same dataset? same pipeline so then same ML Skill? or how should there be two different ones?

Also when labeling the dataset, how will I do it if fields are going to be totally different?

Hi @JavRR

If the fields are different also you can define all the fields in the Data Labelling under one ML Skill

Say you have 5 fields in document 1 and 5 fields on document 2 which is completely different from document one you can define all the fields and Configure the Extractor accordingly based on the Taxonomy.

Thanks.

1 Like

Thanks. I will try it.
And I suupose that also have to classify in the data labelling?

@JavRR

Can I use the same MLSkill for both type of documents?
  • Yes, you can use the same ML skill for both types of documents
will all the two types of documents be on same dataset?
  • Its not mandatory. But, you have to run multiple pipelines if multiple data sets were created. Try to use single data set unless you don’t have a specific requirement
same pipeline so then same ML Skill?
  • If you are using the same ML skill to train both the documents then same pipeline must be used. If you want to create different ML skills then different pipelines required. Also, it depends on the license you acquire
how will I do it if fields are going to be totally different

*You have to create generic fields that applies to both the documents. If you intended to use multiple ML skills then you create different fields for both the documents

1 Like

Yes @JavRR

For all.
I don’t understand how is it working, but it is working.

I create a taxonomy with 2 different kind of documents.
Document A, has 2 different tables from 2 different sheets
Document B, has 1 table with completely different values.
So there is no correlation.

I uploaded 10 documents A and 10 documents B to AI center , to the data set. I uploaded them on 2 different batches. I label docs A with its 2 respective tables, and left in blank the label of the 3rd table. And of course on doc B I labeled only that single table and left in blank the other. Both labels were present. The important thing is that I create classification when labelling.
I exporte all, and ran the pipeline. Having only 1 ML Skill. And it worked!!!

On my workflow it classifies the type of document and uses the same ml skill.

Thanks everyone because your comments were very helpful and I didn’t stop because of this question.