Invoice Model Retraining Issue - Pipeline failed due to ML Package Issue

Hello All,

I hope you all are doing well and staying safe.

Request you to please help on the issue:

Trying to do a PoC on Invoices processing it will require retraining based on the samples used.

Below are the steps I followed until got the error “Pipeline failed due to ML Package Issue”

  1. Created a project in AI Center
  2. Created ML package for Invoice Model - Version 9
  3. Created ML Skills for the package - Major Version 9, Minor Version 0.
  4. Created an empty dataset
  5. Created UI Project to extract data using ML skills, added validation station and then added Train Extractor scope to store the validated results in output folder and lastly zipped it.
    In taxonomy manager, have only selected 3 fields - Invoice No., Invoice Date and Total.
  6. Created one data manager session. Imported the Invoice schema and deleted all the fields except the above 3.
  7. Imported the validated results and exported to dataset created in step 4.
  8. Created a Train pipeline with the dataset, got the mentioned error

I tried with 10 files as well but the when exported from Data manager to dataset, it is not creating Images folder. What can be the issue?
Also, I deleted all the fields which are not being used after uploading schema before exporting the data set.

Log.txt (11.4 KB)

Please help in identifying the issue and possible solution.

Also, what is the minimum number of documents required to be sent for retraining to attain good results?

Thanking you in anticipation!

@alexcabuz @AntonMcG @Jeremy_Tederry @Adam_Hanson @Ioana_Gligan

The issue is resolved. While selecting the dataset in Pipeline, need to select the right folder inside the data set.

Hey, I have exactly the same issue! Which one is the right folder inside the dataset? And how can I select a folder? I just can select a datset in the Pipeline-Settings.

I would be great if you could help me out here!

Best regards

Hello Susanna,

This is what I followed and it worked as expected:
Once data labeling steps are completed, download the data set:

You either create a new data set where you can upload the above files or create another folder in the same data set which was used for data labeling.
Once done, create a pipeline and select the new dataset or the new folder of the existing dataset and run.

I would expect a new folder to be created in the dataset when exported directly from data labeling session but I am not sure how to achieve that.

I hope this helps, let me know in case of any queries.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.