Classification, ML Skills doubt

JavRR · April 27, 2022, 5:49pm

Hi. This is a difficult question to explain. Is about Document Undestanding
I have a project where there are different kind of pdf files.
I was thinking on the taxonomy define both types of documents.
Later on, use the classification step to determine which type of document belongs.

But then I want to use MLSkill.
Can I use the same MLSkill for both type of documents?
If so, will all the two types of documents be on same dataset? same pipeline so then same ML Skill? or how should there be two different ones?

Also when labeling the dataset, how will I do it if fields are going to be totally different?

suraj.setty · April 27, 2022, 6:01pm

Hi @JavRR

If the fields are different also you can define all the fields in the Data Labelling under one ML Skill

Say you have 5 fields in document 1 and 5 fields on document 2 which is completely different from document one you can define all the fields and Configure the Extractor accordingly based on the Taxonomy.

Thanks.

JavRR · April 27, 2022, 6:24pm

Thanks. I will try it.
And I suupose that also have to classify in the data labelling?

ushu · April 27, 2022, 6:27pm

@JavRR

Can I use the same MLSkill for both type of documents?

Yes, you can use the same ML skill for both types of documents

will all the two types of documents be on same dataset?

Its not mandatory. But, you have to run multiple pipelines if multiple data sets were created. Try to use single data set unless you don’t have a specific requirement

same pipeline so then same ML Skill?

If you are using the same ML skill to train both the documents then same pipeline must be used. If you want to create different ML skills then different pipelines required. Also, it depends on the license you acquire

how will I do it if fields are going to be totally different

*You have to create generic fields that applies to both the documents. If you intended to use multiple ML skills then you create different fields for both the documents

suraj.setty · April 27, 2022, 6:50pm

Yes @JavRR

JavRR · June 3, 2022, 4:35pm

For all.
I don’t understand how is it working, but it is working.

I create a taxonomy with 2 different kind of documents.
Document A, has 2 different tables from 2 different sheets
Document B, has 1 table with completely different values.
So there is no correlation.

I uploaded 10 documents A and 10 documents B to AI center , to the data set. I uploaded them on 2 different batches. I label docs A with its 2 respective tables, and left in blank the label of the 3rd table. And of course on doc B I labeled only that single table and left in blank the other. Both labels were present. The important thing is that I create classification when labelling.
I exporte all, and ran the pipeline. Having only 1 ML Skill. And it worked!!!

On my workflow it classifies the type of document and uses the same ml skill.

Thanks everyone because your comments were very helpful and I didn’t stop because of this question.

Topic		Replies	Views
How to use Classification Fields for a Machine Learning Classifier? Can I train a single model to classify different document types? AI Center question , ai_center	9	2355	October 3, 2022
Machine Learning Classifier Document Understanding studio , bug , activities_panel	5	1940	September 18, 2021
Document Understanding: ML Classification Splitting Document Documentation studio , question	0	1346	January 15, 2022
Document Understanding: Machine Learning Document Classification Community Release Document Understanding document_understanding	17	5611	March 3, 2023
How to group column fields in different tables in data labelling for a dataset of custom ML package? AI Center question , ai_center	2	955	July 13, 2021

Classification, ML Skills doubt

Related topics