Document Understanding - Train Data

Hi,

I developed a project to extract data from Invoices. I used machine learning extractor and machine learning extractor trainer. I want to train the data based on human feedback. I built a project in AI Center and create a ML Package, ML skills using Out-of-the-box > Invoices model. And then I labelled a dataset using schema. And I create a pipeline with enabling auto retraining.

Then I need to know , when human validate some data , how it trained? Can we re -trained the pre - trained model with our new data?
What happened in data labelling ,if i add a new regular field which is not in the schema ? Can I add a new regular filed?

With our new human validated data , will it retrain the full model or will it retain only the dataset we gave?

Follow above link for retraining Invoice model, let me know if you need any help.

Thank you very much. And can we download the datasets in public endpoints and train?

@Ashmi_Uththama_Handunge

so this is how the flow goes

  1. Create a dataset in ai center
  2. use the dataset while training and creating the skill…enable auto retrain and auto upgrade for ml skill
  3. now in process when a new validated data comes in upload the file to dataset created in step1
  4. now as the retraining is enabled when next retrain interval comes all the documents present in dataset will be used for training and new model ans skill is created…which gets auto consumes anyways and this cycle continues
  5. now if you want to add new fields then you need to change the taxonomy as well and also label and train all the documents for new field as well

Hope this helps

cheers

1 Like