Train extraction dataset

I have 3 layout of invoices. 50 examples of each invoice. I have done the labeling session for this. Now I want to run an automated training extraction. See image below.

When I run the automated training, it gives an error. See error logs below:

[
{
“entityId”: “8cbad905-2f90-ee11-8925-0022489a7f9f”,
“timeStamp”: “2023-12-01T09:50:16.693Z”,
“level”: “INFO”,
“messageKey”: “EXPORT_STARTED”,
“messageParams”: {}
},
{
“entityId”: “8cbad905-2f90-ee11-8925-0022489a7f9f”,
“timeStamp”: “2023-12-01T09:50:25.6473644”,
“level”: “Information”,
“messageKey”: “Extraction Export Event: Status is: started”,
“messageParams”: null
},
{
“entityId”: “8cbad905-2f90-ee11-8925-0022489a7f9f”,
“timeStamp”: “2023-12-01T09:56:23.187Z”,
“level”: “ERROR”,
“messageKey”: “GENERIC_ERROR”,
“messageParams”: {}
},
{
“entityId”: “8cbad905-2f90-ee11-8925-0022489a7f9f”,
“timeStamp”: “2023-12-01T09:56:23.4729785”,
“level”: “Information”,
“messageKey”: “Extraction Export Event: Status is: error”,
“messageParams”: null
},
{
“entityId”: “8cbad905-2f90-ee11-8925-0022489a7f9f”,
“timeStamp”: “2023-12-01T09:56:23.4935431”,
“level”: “Error”,
“messageKey”: "Export failed: ",
“messageParams”: null
}
]

I am not sure what the issue is. Any advise?

The error logs you provided indicate a generic error during the extraction export process. Unfortunately, the logs don’t provide specific details about the nature of the error. However, some general advice on how to troubleshoot and address such issues going forward

  1. Check Data Quality:
  • Ensure that the labelled data is accurate and representative of the actual invoices. If there are inaccuracies in the labelling, it can affect the model’s performance.
  1. Review Labeling Schema:
  • Verify that the labelling schema used during the labelling session is consistent with the expectations of the automated training system. Check if the labels and entities match the requirements of the extraction model.
  1. Input Format:
  • Confirm that the input format (image or other data) provided for training matches the expected format by the automated training system. The error might be due to a mismatch in the data format.
  1. Model Configuration:
  • Check the configuration settings for the automated training process. Ensure that the model parameters, hyperparameters, and other settings are appropriate for your task.
  1. Data Volume:
  • Assess if the amount of training data is sufficient for the model to learn the extraction patterns. If the dataset is too small or lacks diversity, the model may struggle to generalize.
  1. Resource Availability:
  • Ensure that there are no resource constraints during the training process. Insufficient memory, processing power, or disk space can lead to errors.
  1. Review Documentation:
  • Consult the documentation provided by the automated training platform for any specific requirements or troubleshooting steps. There might be platform-specific considerations that need attention.
  1. Contact Support:
  • If the issue persists and you can’t identify the root cause, consider reaching out to the support team of the automated training platform. They may be able to provide specific insights based on the platform’s internals.
  1. Logs and Debugging:
  • Enable additional logging or debugging options if available. This might provide more detailed information about the error, helping you pinpoint the issue.
  1. Iterative Testing:
  • If possible, conduct iterative testing by making small changes to the input data or training parameters to identify when the error occurs and under what conditions.

Go to the labelling session and make sure you have at least 10 samples marked for each field (each field is marked atleast in 10 docs). Also, there is an export button on data labelling session, try once from there and see if its throwing any error.

The issue resolved itself. There is no clear reason why or how the issue solved itself.