Troubleshooting Container Not Starting

What are the possible reasons why Pod is not starting, and how to troubleshoot?

Issue Description: How to troubleshooting a container or pod not starting.

Root Cause / Background: A container cannot start for multiple reason. It could be it cannot initialize or that it is starting but immediately fails. The following steps will help diagnose the issue.

Diagnosing

  1. There are a few ways to find if a pod is not starting
    1. The first is to use argocd. See: How To Debug Issues Using Argo CD?
      • Specifically, looking for degraded apps will typically lead us to the issue. In the KB, it gives an example of tracing the failing component to the rook-ceph-rgw pod.
    2. The second way is to use kubectl commands.
      1. Log into one of the master nodes
      2. Enable kubectl: Enabling kubectl
      3. Run the following command:
        • kubectl get pods -A | grep -v Running | grep -v Complete
      4. The above command will return a list of pods that are not in the Running state. Some of the pods might simply be in the process of starting up.
  2. Next, determine what states the pods are in.
    1. For viewing in argocd, the state is displayed in the UI as part of the icon for the pod.
      • Here is an example of what this looks like in the KB example for argocd.
image.png
  • In the above, the broken heart indicating it is unhealthy
  • On the lower right hand side observe information about the state:
    1. The pod was created an hour ago.
    2. Currently its in a Running.
    3. However, the 0/1 means that it is not considered healthy or fully started.
    4. Additionally the number '13' is the number of restarts.
    5. Because kubernetes tries to be self healing. We will see the state go from: running: running -> error -> crashedbackloopoff -> running.
    6. If using the kubectl commands, the output of the display will show the state of the pod.
  1. Depending on the state there are actions we can take. The following is a list of states that we might see if a container is not starting:
    1. Pending - This state means the pod could not be scheduled.
      1. Argo: Check the events.
      2. Kubectl: kubectl -n describe pod
    2. ContainerCreating - A container stuck in this state usually points to an issue with pulling an image or containerd. The events should explain the issue.
      1. Argo: Check the events.
      2. Kubectl: kubectl -n describe pod
    3. CrashLoopBackOff
      • Argo:
        1. Check the pod logs.
        2. If the logs do not end with an exception, check the events.
        3. If the events do not explain why the pod is crashing (typically if the events explain the issue it would be due to a failed health check) then check the container status. This is under Summary->Live Manifest. See the exit code. See: Pod Exits With No Error Message .
        4. If the exit code is greater than 127, see: Pod Exits With No Error Message .
      • Kubectl:
        1. kubectl -n logs
        2. If the logs do not end with an exception, check the events: kubectl -n describe pod
        3. If the events do not explain why the pod is crashing (typically if the events explain the issue it would be due to a failed health check) then check the container status.
        4. If the exit code is greater than 127, see Pod Exits With No Error Message
    4. ImagePullBackOff and ErrImagePull
      1. Argo: Check the events.
      2. Kubectl: kubectl -n describe pod
    5. PodInitializing
      1. ​​​Argo: Check the events.
      2. Kubectl: kubectl -n describe pod
    6. Evicted
      1. ​​​Argo: Check the events.
      2. Kubectl: kubectl -n describe pod
      3. In most of these cases, check alerts. Evicted pods indicate a problem with the node.



Note:
Volume attachment failed Event:

Warning FailedAttachVolume 11s (x6 over 30s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-6459e119-f581-48a0-8f85-4f674b2ae9fb" : rpc error: code = DeadlineExceeded desc = volume pvc-6459e119-f581-48a0-8f85-4f674b2ae9fb failed to attach to node XXXXXXX

See: How To Fix Looping PVC?

ImagePullBackOff or ErrImagePull: See ImagePullBack Error In Airgapped Installation for more details.