Pipeline pending for long time more than 3 days

Pipeline pending for long time (more than 3 days).

Environment:

AI Center on Automation Suite 23.4.3, Air-gapped

Issue: Pipeline is blocking / pending and not complete for more than 3 days.

Analysis:

  • No GPU used
  • ML package/model in the pipeline is 23.4.2

image.png

  1. Run below commands to check general information, and observe if CPU/Memory/Disk usage is normal
  • sudo su
  • export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
  • export PATH="$PATH:/usr/local/bin:/var/lib/rancher/rke2/bin"
  • kubectl get nodes
  • kubectl describe node

  1. Run command to list all pods
  • kubectl get pods -A

and see one pod of pipeline is in ImagePullBackOff status

uipath 69db1faa-b7fa-4f68-8751-6362edf5c5c5-2gbwc 0/1 ImagePullBackOff 0 3d20h

  1. Run command to check this pod
  • kubectl -n uipath describe pod 69db1faa-b7fa-4f68-8751-6362edf5c5c5-2gbwc

and observe error which means it needs pod image du-semistructured:v23.4.2-rc18

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Pulling 18m (x1705 over 6d1h) kubelet Pulling image "localhost:30071/aicenter/du-semistructured:v23.4.2-rc18"

Normal BackOff 3m25s (x38513 over 6d1h) kubelet Back-off pulling image "localhost:30071/aicenter/du-semistructured:v23.4.2-rc18"

  1. Run below command to retrieve all available pod images (for air-gapped)
  • sudo podman images

and observe the available version of du-semistructed image is v23.4.3-rc21, no the 23.4.2-rc18 which is needed

localhost:30071/aicenter/du-semistructured v23.4.3-rc21 e1ca2d9a1dcb 7 weeks ago 20.8 GB

Resolution:

  1. Update the DU ML package/model version from 23.4.2 to 23.4.3, or
  2. Download du-semistructured 23.4.2 bundle at this link, and install it by following the instructions ML - Install Offline Official Bundle .