ImagePullBack Error

How to fix ImagePullBack error?

Issue Description

A pod will enter this state when the image used by the containers within the pod cannot be downloaded

Background

A pod consists of a collection of one or more containers. In order to it to start, it must first download the image for the containers. When this issue is encountered it means there was an issue downloading the container.

It is most common in airgapped scenarios.

Diagnosing / Resolving

  1. For each of these steps, make sure to capture screenshots or the outcomes of each steps. It can be very helpful if a ticket is raised to UiPath.
  2. There is a hard to discover issue that can occur with VMs hosted in VMWare. This issue mostly occurs in airgapped environments. As precaution in VMware environments, try the following (must be ran on all nodes. )
    1. ethtool -K ens192 tx-checksum-ip-generic off
    2. kubectl -n kube-system rollout restart ds/cilium
      • For older versions 21.10-22.4: kubectl -n kube-system rollout restart ds/rke2-canal
    3. For server: systemctl restart rke2-server
    4. For agent: systemctl restart rke2-agent
  3. For airgapped environments, check if the docker pod is running:
    1. Run: kubectl -n docker-registry get pods -o wide
    2. Verify the user in a running state. If it is not, this is most likely the cause of the issue. To address this see: Troubleshooting Container Not Starting
    3. If the docker registry pod is running try to access the docker registry from the node that it is running on. The output of the first command to get the status of the pod should also show its node name. To connect to the registry run: curl -vk https://localhost:30071
    4. If the connection does not succeed, try and generate a support bundle and open a ticket (If it does not connect its a container networking issue)
    5. If the connection succeeds, try from the node where the pod is failing.
      1. To find the node its hosted on check the summary in argo: How To Debug Issues Using Argo CD?
      2. Or run: kubectl get pods -A | grep -v Running | grep grep -v Complete
        • Output should show the node name.
    6. If the connection succeeds from the node where the pod is failing, the issue should resolve itself. To expedite it the pod can be deleted.
      1. kubectl -n delete pod
      2. If it still does not resolve itself, go to the next step (step 3.)
    7. If the connection does not succeed from the node where the pod is failing, then try and generate a support bundle (for investigation) and then restart the rke2-service (only applies to multi-node)
      1. sudo /opt/node-drain.sh
      2. sudo rke2-killall.sh
      3. For server: sudo systemctl restart var-lib-rancher-rke2-server-db.mount
      4. For server: systemctl restart rke2-server
      5. For agent: systemctl restart rke2-agent
  4. If the environment is not airgapped, or the above steps did not help, try to see where the container is trying to be pulled from:
    1. kubectl -n describe pod
    2. This should show some events related to the image that cannot be pulled (hint: if there is no event, it can usually be regenerated by deleting the pod)
    3. Here is an example event:
      • Failed to pull image "fake.registryuipath.com/test": rpc error: code = Unknown desc = failed to pull and unpack image "fake.registryuipath.com/test:latest": failed to resolve reference "fake.registryuipath.com/test:latest": failed to do request: Head "https://fake.registryuipath.com/v2/test/manifests/latest": dial tcp: lookup fake.registryuipath.com on 168.63.129.16:53: no such host
    4. Sometimes the event will explain the issue. In the above example, we reference a registry that does not exist. Usually the issue will be a connectivity issue. We can test for connectivity issues by trying to connect to the endpoint manually:
      1. curl -vk https://
      2. i.e. based on our example the endpoint would be: curl -vk https://fake.registryuipath.com
      3. Querying the endpoint manually can sometimes reveal network issues.
    5. Another possible error might be something like:
      1. Failed to pull image "registry.uipath.com/nonexistentimage": rpc error: code = NotFound desc = failed to pull and unpack image "registry.uipath.com/nonexistentimage:latest": failed to resolve reference "registry.uipath.com/nonexistentimage:latest": registry.uipath.com/nonexistentimage:latest: not found
      2. In the above, the issue is that the image does not exist (not found).
      3. If this is seen contact uipath.
  5. If none of the above helps, please contact Uipath. Provide the support bundle (if possible) and the outputs of the steps above.

​​​​​​​