How to troubleshoot the Error "Training Job failed, error: Automatic re-training enabled but latest.txt file is missing from root of dataset directory" ?
Issue Description: During the execution of the pipeline, the training job encountered an error, resulting in failure. The error message indicates that the "latest.txt" file is missing from the root of the dataset directory. This error is commonly encountered when the auto-retraining feature is enabled, but the required "latest.txt" file is not present in the dataset path provided.
Root Cause: The "latest.txt" file is essential for enabling the auto-retraining feature in the pipeline configuration. When this feature is set to "True," the pipeline expects to find the "latest.txt" file in the root of the dataset directory. This file is used to determine the latest version of the dataset for retraining.
Resolution Steps: Follow the appropriate steps based on the use case,
Case 1: Auto-Retrain Pipeline
To run an auto-retrain pipeline, ensure that the "latest.txt" file is present in the specified dataset path. Ensure its presence:
- Verify the Dataset Path: Double-check the dataset path you provided in the pipeline configuration. For example, if "/dataset/exports" is specified as the dataset path, make sure the "latest.txt" file is present at the root level of the "/dataset/exports" folder.
- Generate "latest.txt" File: To create the "latest.txt" file, schedule an export in the Document Manager session. Refer to the following link for more details on scheduling exports: Document Manager: Schedule Export Feature.
Case 2: Non-Auto-Retrain Pipeline
If not running an auto-retrain pipeline, follow these steps to modify the pipeline configuration:
- Remove Auto-Retrain Parameter: While creating the pipeline, ensure that the parameter "auto-retraining" is not set to "True" in the Environment Variables section. If it is present, set it to "False."
- Choose Correct Dataset: Make sure to select the correct input dataset and evaluation dataset for the pipeline. Confirm that the dataset chosen is suitable for the pipeline's intended purpose.
- Restart the Pipeline: After making the necessary changes, restart the pipeline to apply the updated configuration.
Additional Assistance:
If the steps above are followed and still experiencing issues, it is required to troubleshoot the problem further. To do so, provide a screenshot of the pipeline details, including the input dataset selection and the parameters set. Ensure that the screenshot includes the value set in the "Parameters" dropdown.
Once this information availability, it is possible to understand the configuration better and offer more targeted support to resolve the issue and ensure a successful pipeline run.