Pipeline pending for long time (more than 3 days).
Environment:
AI Center on Automation Suite 23.4.3, Air-gapped
Issue: Pipeline is blocking / pending and not complete for more than 3 days.
Analysis:
- No GPU used
- ML package/model in the pipeline is 23.4.2
- Run below commands to check general information, and observe if CPU/Memory/Disk usage is normal
- sudo su
- export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
- export PATH="$PATH:/usr/local/bin:/var/lib/rancher/rke2/bin"
- kubectl get nodes
- kubectl describe node
- Run command to list all pods
- kubectl get pods -A
and see one pod of pipeline is in ImagePullBackOff status
uipath 69db1faa-b7fa-4f68-8751-6362edf5c5c5-2gbwc 0/1 ImagePullBackOff 0 3d20h
- Run command to check this pod
- kubectl -n uipath describe pod 69db1faa-b7fa-4f68-8751-6362edf5c5c5-2gbwc
and observe error which means it needs pod image du-semistructured:v23.4.2-rc18
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 18m (x1705 over 6d1h) kubelet Pulling image "localhost:30071/aicenter/du-semistructured:v23.4.2-rc18"
Normal BackOff 3m25s (x38513 over 6d1h) kubelet Back-off pulling image "localhost:30071/aicenter/du-semistructured:v23.4.2-rc18"
- Run below command to retrieve all available pod images (for air-gapped)
- sudo podman images
and observe the available version of du-semistructed image is v23.4.3-rc21, no the 23.4.2-rc18 which is needed
localhost:30071/aicenter/du-semistructured v23.4.3-rc21 e1ca2d9a1dcb 7 weeks ago 20.8 GB
Resolution:
- Update the DU ML package/model version from 23.4.2 to 23.4.3, or
- Download du-semistructured 23.4.2 bundle at this link, and install it by following the instructions ML - Install Offline Official Bundle .