If the installation of the GPU node is successful but the node still isn't scheduling tasks with the GPU
Description:
Here are the steps to validate that the GPU node installation is successful, but the node is not scheduling skills with the GPU and remains in a pending state:
- Check if the NVIDIA driver is installed on the GPU node by executing the following command:
nvidia-smi
- The output should be similar to the following:
- If the NVIDIA driver is not installed, please install it before proceeding by following the steps provided here .
- Check if the GPU node is joined to the server node by running the following command:
kubectl get nodes
- The GPU node should be in a ready state.
- To enable the GPU node, follow the steps below:
Log in to any server node and navigate to the installer folder (UipathAutomationSuite) by executing the following command:
cd /opt/UiPathAutomationSuite sudo ./configureUiPathAS.sh gpu Enable
- Check if the GPU is now under the resources by running the following command:
kubectl describe node
- The result should resemble the following:
- If nvidia.com/gpu is not present, run the following commands on the GPU node:
awk '1;/plugins."io.containerd.grpc.v1.cri".containerd]/{print " default_runtime_name = \"nvidia\""}' /var/lib/rancher/rke2/agent/etc/containerd/config.toml > /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
systemctl stop rke2-agent
rke2-killall.sh
systemctl start rke2-agent
- To verify whether the GPU resource appears, execute the following command:
kubectl describe node
- It should be noted that nvidia.com/gpu is present.