Configuring and validating GPU node

If the installation of the GPU node is successful but the node still isn't scheduling tasks with the GPU


Description:

Here are the steps to validate that the GPU node installation is successful, but the node is not scheduling skills with the GPU and remains in a pending state:

  • Check if the NVIDIA driver is installed on the GPU node by executing the following command:
nvidia-smi
  • The output should be similar to the following:

  • If the NVIDIA driver is not installed, please install it before proceeding by following the steps provided here .
  • Check if the GPU node is joined to the server node by running the following command:
kubectl get nodes
  • The GPU node should be in a ready state.
  • To enable the GPU node, follow the steps below:
Log in to any server node and navigate to the installer folder (UipathAutomationSuite) by executing the following command:
cd /opt/UiPathAutomationSuite
sudo ./configureUiPathAS.sh gpu Enable
  • Check if the GPU is now under the resources by running the following command:
kubectl describe node 
  • The result should resemble the following:

  • If nvidia.com/gpu is not present, run the following commands on the GPU node:
awk '1;/plugins."io.containerd.grpc.v1.cri".containerd]/{print " default_runtime_name = \"nvidia\""}' /var/lib/rancher/rke2/agent/etc/containerd/config.toml > /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
systemctl stop rke2-agent
rke2-killall.sh
systemctl start rke2-agent
  • To verify whether the GPU resource appears, execute the following command:
kubectl describe node 
  • It should be noted that nvidia.com/gpu is present.