Hi Oliver,

Nvidia's vGPU/MIG are quite popular options and usage of them don't really require cyborg - they can be utilized solely with Nova/Placement. However, there are plenty of nuances, as implementation of vGPUs also depends on the GPU architecture - Tesla's are quite different from Amperes in how they got created driver-side and got represented among placement resources.

Also I'm not sure that desktop cards, like RTX 3050, does support vGPUs at all. Highly likely, that the only option for this type of cards will be PCI-passthrough, which is supported quite well and super easy to implement, as doesn't require any extra drivers. But if you want to leverage vGPUs/MIG, you will likely need cards like A10 (which doesn't have MIG support) or A30. Most of supported models along with possible slices are mentioned here:
https://docs.nvidia.com/grid/15.0/grid-vgpu-user-guide/index.html#supported-gpus-grid-vgpu

Regarding licensing - with vGPU approach you always license clients, not hypervisors. So you don't need any license to create VMs with vGPUs, just hypervisor driver that can be downloaded from Nvidia enterprise portal. And you will be able to test out if vGPU works inside VM, as absent license will apply limitations only after some time. And license type also depends on the workloads you want to run. So in case of AI training workloads you most likely need vCS license, but then vGPUs can be used only as computational ones, but not for virtual desktops.
You can read more about licenses and their types here: https://docs.nvidia.com/grid/15.0/grid-licensing-user-guide/index.html

To be completely frank, if our workloads won't require CUDA support, I would look closely on AMD GPUs, since there is no mess with licensing and their implementation of SR-IOV is way more starightforward and clear, at least for me. So if you're looking for GPUs for virtual desktops, that might be a good option for you. However, Nvidia is way more widespread in openstack workloads, so it's more likely to get some help/gotchas regarding Nvidia rather then any other GPU.

чт, 12 янв. 2023 г., 07:58 Oliver Weinmann <oliver.weinmann@me.com>:

Dear All,

we are planning to have a POC on VGPUs in our Openstack cluster. Therefore I have a few questions and generally wanted to ask how well VGPUs are supported in Openstack. The docs, in particular:

https://docs.openstack.org/nova/zed/admin/virtual-gpu.html

explain quite well the general implementation.


But I am more interested in general experience with using VGPUs in Openstack. We currently have a small YOGA cluster, planning to upgrade to Zed soon, with a couple of compute nodes. Currently our users use consumer cards like RTX 3050/3060 on their laptops and the idea would be to provide VGPUs to these users. For this I would like to make a very small POC where we first equip one compute node with an Nvidia GPU. Gladly also a few tips on which card would be a good starting point are highly appreciated. I know this heavily depends on the server hardware but this is something I can figure out later. Also do we need additional software licenses to run this? I saw this very nice presentation from CERN on VGPUs:

https://indico.cern.ch/event/776411/contributions/3345183/attachments/1851624/3039917/02_-_vGPUs_with_OpenStack_-_Accelerating_Science.pdf

In the table they are listing Quadro vDWS licenses. I assume we need these in order to use the cards? Also do we need something like Cyborg for this or is VGPU fully implemented in Nova?

Best Regards,

Oliver