AI Center Pipeline Running Since A Long Time

system · September 14, 2021, 12:45pm

How to check a pipeline that is running since a long time?

Root Causes:

Pipeline could be Running for longer duration than expected due to
- Huge dataset
- Number of epochs
- GPU non-usage
Pipeline could be stuck in Running state due to
- Infra Issue
- Bugs in product .

Troubleshooting Steps:
Pipelines might be running for a long time or might have been stuck in running state due to various reasons. To confirm if a pipeline is actually stuck or is still running, follow below steps

Open the pipeline and check for logs section

If the logs are recent and are streaming, the pipeline is in progress.
If the last log is generated long back, it means that the pipeline is stuck. Download the logs using download button present below the logs section and share it with the support engineer. If the download button is not present or disabled, copy the logs from logs section and share it with support engineer, along with details from section "Details to be shared with support team for further troubleshooting" below

If the pipeline is progressing/ running but it has been running since a long time, try below steps to run the pipeline again with shorter duration

Dataset - Ensure that dataset is well curated and only required data is included in dataset. Having larger dataset doesn't ensure that ML model would be trained better. Refer to respective ML model page or Training High Performance ML models for DU ML models on better understanding of dataset requirements for training.
Epochs - Number or epochs represent the number of times the training algorithm will work through entire dataset. By default, this number is set to 100 for most ML models. However, this number can be reduced for lower pipeline time. The number of epochs for a pipeline can be configured using ml_model.epochs parameter of pipeline. But this needs to be done carefully as lower number of epochs might lead to underfitting of ML model, which in turn leads to poor performance of the ML model.
GPU - Training on GPU is 10 times faster when compared to training on CPU. Infact, for training DU ML models, GPU is required when dataset exceeds 1000 documents. Running a pipeline on GPU consumes higher number of AI units (in cloud AIC), hence ensure that there are sufficient number of AI units. In an on-prem installation, ensure that a node with GPU is available before starting a pipeline on GPU.

Information to Share with UiPath Support Engineer:

Cloud AIC Details

In the Cloud Tenant where this issue is occurring, gather the following information:

Support ID: (This can be found by navigating to Cloud -> Admin -> Settings. The Support ID is located at the top right of this screen)
URL: (This can be found by navigating to Cloud -> Admin -> Settings)
Account ID: (This can be found by navigating to Cloud -> AICenter -> In the top right of the screen, click the 3 vertical dots -> View Profile)
Tenant ID: (This can be found by navigating to Cloud -> AICenter -> In the top right of the screen, click the 3 vertical dots -> View Profile)
AI Center Project Name:
Share a screenshot of the ML Pipeline Page
Pipeline Details: (Click on the pipeline and share a screenshot of the top of the page)
Pipeline Logs: (Click on the pipeline and share a screenshot of the pipeline logs before scrolling down the logs. There could be an error message listed here, that would not be visible when exporting the full logs. After gathering the screenshot, copy the logs and share them by pasting in an email. If there is an option at the bottom of the logs stating partial logs are being displayed, please download the full log and share them.)
What was the base model used for training the pipeline? (For example: ML Packages/Out of the box Packages/UiPath Document Understanding/Invoices version 22.10.1.0).

On-Prem AIC Details

Version of AIC/AS (including minor version ex: 2022.10.1)
If the installation is standalone AIC or Automation Suite
If the installation is single node or multi node
If the installation is airgapped or online
Support Bundle
Diagnostic Logs
Share a screenshot of the ML Pipeline Page
Pipeline Details: (Click on the pipeline and share a screenshot of the top of the page)
Pipeline Logs: (Click on the pipeline and share a screenshot of the pipeline logs before scrolling down the logs. There could be an error message listed here, that would not be visible when exporting the full logs. After gathering the screenshot, copy the logs and share them by pasting in an email. If there is an option at the bottom of the logs stating partial logs are being displayed, please download the full log and share them.)
Details of base model used for training the pipeline (For example: ML Packages/Out of the box Packages/UiPath Document Understanding/Invoices version 22.10.1.0).

Topic		Replies	Views
AI pipeline shows as "running", but logs do not display any activity. Is it frozen, or just taking a while? AI Center activities , feedback , document-understanding-in-ai-center--clo , machine-learning-extractor	5	1215	June 3, 2022
Model Training Failure With CPUs and Quite Large Dataset Knowledge Base ai_center	0	399	December 29, 2022
Pipeline is taking so much time AI Center question , ai_center	3	2206	April 7, 2022
Pipeline is taking too much of time in Ai center AI Center question , ai_center	13	1300	October 16, 2022
AI-Fabric pipeline get killed automatically after being in running status for 7 days AI Center	2	832	July 27, 2021

AI Center Pipeline Running Since A Long Time

How to check a pipeline that is running since a long time?

Related topics