Le jeu. 12 janv. 2023 à 08:02, Oliver Weinmann <oliver.weinmann@me.com> a écrit :

Dear All,

we are planning to have a POC on VGPUs in our Openstack cluster. Therefore I have a few questions and generally wanted to ask how well VGPUs are supported in Openstack. The docs, in particular:

https://docs.openstack.org/nova/zed/admin/virtual-gpu.html

explain quite well the general implementation.

Indeed, and that's why you can't find nvidia-specific documentation in there. Upstream documentation in general shouldn't be telling about specific hardware but rather the general implementation.

But I am more interested in general experience with using VGPUs in Openstack. We currently have a small YOGA cluster, planning to upgrade to Zed soon, with a couple of compute nodes. Currently our users use consumer cards like RTX 3050/3060 on their laptops and the idea would be to provide VGPUs to these users. For this I would like to make a very small POC where we first equip one compute node with an Nvidia GPU. Gladly also a few tips on which card would be a good starting point are highly appreciated. I know this heavily depends on the server hardware but this is something I can figure out later. Also do we need additional software licenses to run this? I saw this very nice presentation from CERN on VGPUs:

https://indico.cern.ch/event/776411/contributions/3345183/attachments/1851624/3039917/02_-_vGPUs_with_OpenStack_-_Accelerating_Science.pdf

In the table they are listing Quadro vDWS licenses. I assume we need these in order to use the cards?

Disclaimer : I'm not a Nvidia developer and I just enable their drivers so maybe I could provide wrong answers but lemme try.

First, consumer cards like RTX3xxx GPUs don't support virtual GPUs because they don't have a specific nvidia license for them.

For being able to create virtual GPUs, you need to rather have professional nvidia cards like Tesla or Ampere. See this documentation, it will explain both the supported hardware and the licenses you need to use (in case you want to run it from a RHEL compute) :

https://docs.nvidia.com/grid/13.0/grid-vgpu-release-notes-red-hat-el-kvm/index.html#validated-platforms

That being said, you'll quickly discover those GPUs can be expensive, so maybe it would good for you to know that nvidia T4 GPUs work correctly for what you want to test.

Also do we need something like Cyborg for this or is VGPU fully implemented in Nova?

You can do either, but yeah Virtual GPUs are fully supported within Nova as of now.

HTH,

-Sylvain

Best Regards,

Oliver