ML skill not extract proper details

I created document understanding package, trained with sample files using pipeline then created skill.

When I ran the skill in studio, It managed to fetch the desired data for the documents I provided but one particular not one particular field in one particular document. Then I added the same document in dataset and trained the package and selected the trained version and deployed. But still DU failed to label that field.

Please check below image, which is trained in in AI Center.

I wanted to process the same document with funding amount as 970. It is not able to fetch it. what could be potential issue?

@Veera_Boddula

Did you do the datalabeling again for that one field?

cheers

@Anil_G Thank you for the reply. Please check my screenshot attached to original post. I already did the data labeling.

@Veera_Boddula

You labelled only one document?

atleast 5 documents are to be labelled for it get trained

Please include more files of this type

cheers

Hi Anil,

Can i Kindly asked you to check the snap I attached, I trained 10 documents of same format

@Veera_Boddula

If you check the question…it clearly says you added the field only for one particular document…not for all…so it is not evident to understand…as per this we see no other document has that field…

Can you confirm if all 10 have that field and you labelled in all 10 coduments?

cheers

Hi @Anil_G ,

Sorry, If my question created confusion. I trained 10 document of same format all the fields labeled in all 10 documents. The last document which is in screenshot, I am trying to process the same document with funding amount as 470. But it was trained with value 480. If I process with funding amount as 460. Bot can fetch the data, Only problem with funding amount as 470.

Hi @Veera_Boddula ,

Generally, we would train with more than 10 documents, even if 10 documents are a minimum we go with with 20-25 of the same format and then Train the Model with this dataset.

Also, do remember that each stage of Re-Training, Train the Base Version with all the Set Of Labelled data.

Let us know if you are able to label for 20-25 documents of the same format, we can then check if it does not correspond to the field mapping.

Also, not a very complete info is given on the fields or on How the Training was done? How many versions of the Model were already created ?

Hi @supermanPunch,

Training done via Train pipeline using dataset exported from Data labeling session. I will try feed 25 docs and let you know the outcome. It was trained only on based version 0. ML skill deployed using Version 1.

Thanks
Veera

@Veera_Boddula

When you retrain the model from 0 also you will get a new version other than 1…that is to be selected…did you select latest one only?

and training more also would help

cheers

@Veera_Boddula ,

Alright. Let’s keep the base version as 0 and move forward with the Pipeline Training when also additional training dataset is added, Train the whole dataset and not just the additional data that was added.

Let us know once you were able to use the ML Skill and if there are improvements observed.

@Anil_G

Please check below screens

ML Package Version

Pipeline

Pipeline ran once of type “Train”

Thanks
Veera

@Veera_Boddula

Can you create a new pipeline instead of restarting the same please

cheers

Hi, try create a new regular field then export with the name please check the (ALL LABELED).

Then go to pipeline use the minor version and use in studio the machine learning extractor.
Wait the package is successfull in status in pipeline then use only if is successful.
Then create new evaluation to see the score.


And the most important Create new train, cause this is the package to use in ML_SKILL:

Then activate the ML_SKILL to the new version like this: