AI Center Pipeline Failed

Pipeline in AI Center has failed. How to troubleshoot it?

Pipeline in AI Center might fail due to various reasons. Follow steps below to troubleshoot these issues.

Types of Error Messages:

  • [Error] Check that document type data is in dataset folder and follows folder structure.

Root Cause:
This error occurs when the format of the dataset is not as expected by the ML model.


Troubleshooting Steps:
The format of the folder provided for training needs to be in dataset format. Ensure that either the path provided is right or that provided dataset is exported from Document Manager.
Refer to Document Manager - Export Documents : Dataset Format .

  • In case of scheduled pipelines for auto retraining loop, select the folder containing the exports from data labelling sessions & latest.txt.
  • [Error] Images/ directory does not exist / is empty for invoices dataset.
Root Cause:
The dataset path provided for either training dataset or evaluation dataset is empty.

Troubleshooting Steps:
Update the dataset path for evaluation/training according to the pipeline. Ensure that the provided path exists in the AIC Dataset tab in same tenant as the pipeline and make sure that the path is not empty.
  • [Error] Training and / or test set is empty, verify that training / test split is correctly set in split.csv .
Root Cause:
split.csv is part of dataset for DU ML models and contains documents marked for training and evaluation. Training documents are marked as TRAIN/VALIDATE and evaluation documents are marked as TEST. The train documents are marked TRAIN & VALIDATE in 80:20 ratio automatically by Data Manager. If there are no documents that are either marked as TRAIN or VALIDATE, (generally in cases where the split.csv/dataset is manually modified) above error occurs.

Troubleshooting Steps:
Open the DM session associated with the dataset and export the dataset again, while ensuring that there are right filters are set. This ensures that the dataset is exported without any issues. Use the path of the newly exported dataset in the pipeline.
  • [Error] Training failed for pipeline type: FULL_TRAINING, error: Full / evaluation pipelines require an evaluation dataset. Please re-run the pipeline providing an evaluation dataset .
Root Cause:
This error generally occurs if a full pipeline has been started but an evaluation dataset has not been provided.

Troubleshooting Steps:
Evaluation/Full pipeline needs to have an evaluation dataset provided, which should contain evaluation documents (document marked as TEST in split.csv.). Ensure that the path to evaluation dataset is provided while creating the pipeline.
  • [Error] Evaluation dataset schema is not a subset of the trained model schema .
Root Cause:
This error occurs if the schema (fields & their configuration) of the evaluation dataset doesn't match the schema of ML model. In a full pipeline scenario, this can occur if the training dataset and evaluation dataset have different schemas.

Troubleshooting Steps:
Ensure that the fields being served by the ML model match with schema of dataset. If required, open the DM session, modify the fields as per schema of the ML model and export the dataset again before using in evaluation. In a full pipeline scenario, ensure that fields in evaluation dataset match with train dataset. In fact, both the evaluation and train documents can be present in same DM session and same dataset path can be provided for both train and evaluation dataset paths while creating pipeline. It needs to be ensured that the provided dataset path contains both train and evaluation documents. This way, we can ensure that schema is same for both train and evaluation datasets.
  • [Error] Unschedulable 0/n nodes are available : Insufficient CPU/memory/GPU .
Root Cause:
This error occurs if there is crunch in resources (CPU/GPU/memory) in the Kubernetes cluster.

Troubleshooting Steps:
  • Cloud AIC - Try rerunning the pipeline again after 30 min. If the issue still persists, get the details as mentioned below in section Cloud AIC Details and share the details with support team.
  • On-Prem AIC - Check the CPU/Memory/GPU consumption on the node(s) by using "kubectl describe node " command in each AIC server. Check "Allocated Resources" at the bottom of the output and see if any of the resources exceed 90%. If it exceeds more than 90%, there is no enough hardware resource available to provision the pipeline. Either remove existing pipelines/ml skills to make room for new pipeline or increase the hardware resources. Ensure the minimum hardware requirements are met.
  • [Error] No space left on device .
Root Cause:
This error occurs in cases where the dataset size used in the pipeline huge.
Troubleshooting Steps:
  • Cloud AIC - Default allowed dataset size in cloud AIC is 100 GB. If the dataset size exceeds 100 GB, get the details as mentioned below in section Cloud AIC Details and share the details with support team
  • On-Prem AIC - Check the storage disks for free space and take steps to either cleanup or extend the storage accordingly.

  • [Error] FileNotFoundError: [Errno 2] No such file or directory: '/workspace/model/microservice/models/multi_task_base/network.p

Root Cause:
Inadequate number of documents in the dataset - The training pipeline needs enough documents in the dataset so that it has something to split for both the Training and Validation subsets. Example: With only 1 document in the dataset, the single document will only be available to be allocated to training, but nothing can be used for validation.

Troubleshooting Steps:

  • Review the Pipeline logs or the Split.csv in the dataset to check the number of documents that are split between the Train and Validate subset. In the pipeline logs, if the subsets only list TRAIN as in the example below,
Image_2023-10-12_14-57-47.png
or if only TRAIN documents are listed in the split.csv, additional documents will need to be added to the dataset. After the documents are labeled, perform a new export from document manager so that a proper split between TRAIN and VALIDATE can take place.
  • To check the number of documents in the dataset, the Pipeline log can be reviewed further to find the number of documents available in the subsets, or the split.csv file can be reviewed after the dataset has been exported from Document Manager to confirm how many documents should be allocated for TRAIN and how many should be allocated for VALIDATE.

Information to Share with UiPath Product Support

Incase, pipeline did not fail because of above listed errors or if further troubleshooting is required, share the below details with UiPath Product Support team, corresponding to the environment.

Cloud AIC Details
In the Cloud Tenant where this issue is occurring, gather the following information:

  • AI Center Project Name
  • Share a screenshot of the Pipeline Page (Make sure the problematic pipeline is showing in the list for the screenshot)
  • Pipeline Details: (Click on the pipeline and share a screenshot of the top of the page)
  • Pipeline Report : (Click on the pipeline and then click the Download Pipeline Report Button)
  • Pipeline Partial Log: (Click on the pipeline and share a screenshot of the pipeline logs before scrolling down the logs. There could be an error message listed here, that would not be visible when exporting the full logs. After gathering the screenshot, copy the logs and share them by pasting in an email. If there is an option at the bottom of the logs stating partial logs are being displayed, download the full log and share them.)
  • Pipeline Full Log: (Click on the pipeline and then scroll down to the bottom of the pipeline's page. A down error should be visible below the partial log. Click the download icon to download the full log to share with support.)

On-Prem AIC Details
  • Version of AIC/AS (including minor version ex: 2022.10.1)
  • If the installation is standalone AIC or Automation Suite
  • If the installation is single node or multi node
  • If the installation is airgapped or online
  • Support Bundle
  • Diagnostic Logs
  • Share a screenshot of the ML Pipeline Page
  • Pipeline Details: (Click on the pipeline and share a screenshot of the top of the page)
  • Pipeline Logs: (Click on the pipeline and share a screenshot of the pipeline logs before scrolling down the logs. There could be an error message listed here, that would not be visible when exporting the full logs. After gathering the screenshot, copy the logs and share them by pasting in an email. If there is an option at the bottom of the logs stating partial logs are being displayed, please download the full log and share them.)
  • Pipeline Full Log: (Click on the pipeline and then scroll down to the bottom of the pipeline's page. A down error should be visible below the partial log. Click the download icon to download the full log to share with support.)
  • What was the base model used for training the pipeline? (For example: ML Packages/Out of the box Packages/UiPath Document Understanding/Invoices version 22.10.1.0).