Hi Oliver,
Nvidia's vGPU/MIG are quite popular options and usage of them don't really require cyborg - they can be utilized solely with Nova/Placement. However, there are plenty of nuances, as implementation of vGPUs also depends on the GPU architecture - Tesla's are quite different from Amperes in how they got created driver-side and got represented among placement resources.
Also I'm not sure that desktop cards, like RTX 3050, does support vGPUs at all. Highly likely, that the only option for this type of cards will be PCI-passthrough, which is supported quite well and super easy to implement, as doesn't require any extra drivers. But if you want to leverage vGPUs/MIG, you will likely need cards like A10 (which doesn't have MIG support) or A30. Most of supported models along with possible slices are mentioned here:
Regarding licensing - with vGPU approach you always license clients, not hypervisors. So you don't need any license to create VMs with vGPUs, just hypervisor driver that can be downloaded from Nvidia enterprise portal. And you will be able to test out if vGPU works inside VM, as absent license will apply limitations only after some time. And license type also depends on the workloads you want to run. So in case of AI training workloads you most likely need vCS license, but then vGPUs can be used only as computational ones, but not for virtual desktops.
To be completely frank, if our workloads won't require CUDA support, I would look closely on AMD GPUs, since there is no mess with licensing and their implementation of SR-IOV is way more starightforward and clear, at least for me. So if you're looking for GPUs for virtual desktops, that might be a good option for you. However, Nvidia is way more widespread in openstack workloads, so it's more likely to get some help/gotchas regarding Nvidia rather then any other GPU.
Dear All,
we are planning to have a POC on VGPUs in our Openstack cluster.
Therefore I have a few questions and generally wanted to ask how
well VGPUs are supported in Openstack. The docs, in particular:
https://docs.openstack.org/nova/zed/admin/virtual-gpu.html
explain quite well the general implementation.
But I am more interested in general experience with using VGPUs in
Openstack. We currently have a small YOGA cluster, planning to
upgrade to Zed soon, with a couple of compute nodes. Currently our
users use consumer cards like RTX 3050/3060 on their laptops and
the idea would be to provide VGPUs to these users. For this I
would like to make a very small POC where we first equip one
compute node with an Nvidia GPU. Gladly also a few tips on which
card would be a good starting point are highly appreciated. I know
this heavily depends on the server hardware but this is something
I can figure out later. Also do we need additional software
licenses to run this? I saw this very nice presentation from CERN
on VGPUs:
https://indico.cern.ch/event/776411/contributions/3345183/attachments/1851624/3039917/02_-_vGPUs_with_OpenStack_-_Accelerating_Science.pdf
In the table they are listing Quadro vDWS licenses.
I assume we need these in order to use the cards? Also do we need
something like Cyborg for this or is VGPU fully implemented in
Nova?
Best Regards,
Oliver