How do you move specific pods to a different node (especially when a specific node is overloaded)?
Issue Description
It is impossible to "move" pods to a different node, once they've been scheduled on a specific node. Therefore, this article is technically not about how to delete "move" a pod, but it describes how to delete a pod in a way that it is scheduled on a different node when it comes back up (under normal circumstances, deleted pods are created again on the same node).
Notes:
- This is not best practice, because by design, Kubernetes will eventually balance out the load between nodes, as it schedules pods on nodes based on the resources that the pod is configured to consume (unless there are some other taints/tolerations/selectors that affect the scheduling). Use this article only when manual intervention is necessary.
- The service that the target pod is associated with may experience a few minutes of downtime when the pod is deleted and re-scheduled.
Resolution
- Choose target pod(s)
-
- Choose a pod (preferably in the uipath namespace) that is not critical to cluster functionality (or a specific service's functionality). For example, robotube, webhook, etc.).
- Choose a pod that is scheduled on the node that needs manual intervention (which will probably already have been decided when viewing this KB).
- The command kubectl get pods -A -o wide will help determine the namespace and node that the pod is located in.
- Cordon node
-
- List nodes using kubectl get nodes, to obtain the name of the node that is to be cordoned.
- Run kubectl cordon and replace with the actual name of the node from the previous get nodes command.
- When cordoned successfully, the command will return: node/ cordoned
- Now on running kubectl get nodes again, the cordoned node's STATUS will be Ready,SchedulingDisabled (instead of just Ready).
- Now that scheduling is disabled on that specific node, when the target pod is deleted, it will be re-scheduled on a different node that is ready and schedulable.
- Delete pod
-
- Run kubectl -n delete pod and replace the and with the corresponding values obtained from last command in step1 (kubectl get pods -A -o wide).
- When deleted successfully, the command will return: pod deleted.
- Uncordon node
-
- Wait for the deleted pod to be successfully created on a different node. This process can me monitored by running kubectl -n get pods -w -o wide. Or running kubectl -n get pods -o wide a few times will have the same effect.
- Once the pod is in Running state on a different node, run kubectl uncordon to uncordon the node, making it schedulable again.
- Depending on the cluster settings, the node may already be uncordoned after the pod is deleted and rescheduled, in which the command will return: node/ already uncordoned. Running kubectl get nodes before running the uncordon command is also a way to check whether it has already been uncordoned, but is unnecessary because the uncordon command does not do anything if the node is already uncordoned.