Resolution when pipeline fails with the Error "All Cuda capable devices are busy or unavailable" .
Root Cause: This error means either that something is misconfigured in the AI Center or the graphics card is not working correctly.
Diagnosing
- Run nvidia-smi
- This should return that no processes are running
- If processes are running that could be the cause of the issue. Make sure no other skills are currently using the GPU
- If the command returns an error, check nvidia documentation for how to fix the issue.
- HINT; Sometimes driver updates can cause nvidia-smi to error.
- If no processes are running run the following command:
- nvidia-smi -q
- This command returns the licensing status. Make sure the GPU is licensed
- Check /var/log/message
- grep 'nvidia' /var/log/messages
- If there are errors, check nvidia documentation to see if its a known issue.
- For example if its not getting a license this error will be present: nvidia-gridd: Failed to acquire/renew license from license server
- If everything from the nvidia side looks good, capture a support bundle and raise the issue to UiPath
- 21.4- : v2021.4 Support Docs
- 21.10+ : Using Support Bundle Tool
- Include the output of nvidia-smi and the nvidia-smi -q command.