ML Skills deployment with GPU

During deployment of multiple ML Skills, it seems that only one of the skills can have GPU Activated.
Is this expected behavior or installation is missing some components (for example: docker pull nvidia/k8s-device-plugin:1.9)
On prem online installation behind Proxy. Error message: Unschedulable 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro M5000        Off  | 00000000:03:00.0 Off |                  Off |
| 39%   44C    P8    22W / 150W |      3MiB /  8126MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Yes that’s right, GPU are not sharable among several pods. So when you deploy the first Skill or pipeline with GPU we “mount” the GPU with this pod so we are sure that the GPU will be always available for this Skill/pod.

More doc on this: Schedule GPUs | Kubernetes (it is not possible to request a fraction of a GPU)

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.