Experience with VGPUs
Dear All, we are planning to have a POC on VGPUs in our Openstack cluster. Therefore I have a few questions and generally wanted to ask how well VGPUs are supported in Openstack. The docs, in particular: https://docs.openstack.org/nova/zed/admin/virtual-gpu.html explain quite well the general implementation. But I am more interested in general experience with using VGPUs in Openstack. We currently have a small YOGA cluster, planning to upgrade to Zed soon, with a couple of compute nodes. Currently our users use consumer cards like RTX 3050/3060 on their laptops and the idea would be to provide VGPUs to these users. For this I would like to make a very small POC where we first equip one compute node with an Nvidia GPU. Gladly also a few tips on which card would be a good starting point are highly appreciated. I know this heavily depends on the server hardware but this is something I can figure out later. Also do we need additional software licenses to run this? I saw this very nice presentation from CERN on VGPUs: https://indico.cern.ch/event/776411/contributions/3345183/attachments/185162... In the table they are listing Quadro vDWS licenses. I assume we need these in order to use the cards? Also do we need something like Cyborg for this or is VGPU fully implemented in Nova? Best Regards, Oliver
Hi Oliver, The presentation you linked was only *at* CERN, not *from* CERN (it was during an OpenStack Day we organised here). Sylvain and/or Mohammed may be available to answer the questions you have related to that deck, or also in general for the integration of GPUs. Now, *at* CERN we also have hypervisors with different GPUs in our fleet, and are also looking into various options how to efficiently provision them: as bare metal, as vGPUs, using MIG support, ... and we have submitted a presentation proposal for the upcoming summit to share our experiences. If you have very specific questions, we can try to answer them here, but maybe there is interest and it would be more efficient to organize a session/call (e.g. as part of the Openstack Operators activities or the Scientific SIG?) to exchange experiences on GPU integration and answer questions there? What do you and others think? Cheers, Arne ________________________________________ From: Oliver Weinmann <oliver.weinmann@me.com> Sent: Thursday, 12 January 2023 07:56 To: openstack-discuss Subject: Experience with VGPUs Dear All, we are planning to have a POC on VGPUs in our Openstack cluster. Therefore I have a few questions and generally wanted to ask how well VGPUs are supported in Openstack. The docs, in particular: https://docs.openstack.org/nova/zed/admin/virtual-gpu.html explain quite well the general implementation. But I am more interested in general experience with using VGPUs in Openstack. We currently have a small YOGA cluster, planning to upgrade to Zed soon, with a couple of compute nodes. Currently our users use consumer cards like RTX 3050/3060 on their laptops and the idea would be to provide VGPUs to these users. For this I would like to make a very small POC where we first equip one compute node with an Nvidia GPU. Gladly also a few tips on which card would be a good starting point are highly appreciated. I know this heavily depends on the server hardware but this is something I can figure out later. Also do we need additional software licenses to run this? I saw this very nice presentation from CERN on VGPUs: https://indico.cern.ch/event/776411/contributions/3345183/attachments/185162... In the table they are listing Quadro vDWS licenses. I assume we need these in order to use the cards? Also do we need something like Cyborg for this or is VGPU fully implemented in Nova? Best Regards, Oliver
-> ----邮件原件-----
发件人: Arne Wiebalck [mailto:Arne.Wiebalck@cern.ch] 发送时间: 2023年1月12日 15:43 收件人: Oliver Weinmann <oliver.weinmann@me.com>; openstack-discuss <openstack-discuss@lists.openstack.org> 主题: Re: Experience with VGPUs
Hi Oliver,
The presentation you linked was only *at* CERN, not *from* CERN (it was during an OpenStack Day we organised here). Sylvain and/or Mohammed may be available to answer the questions you have related to that deck, or also in general for the integration of GPUs.
Now, *at* CERN we also have hypervisors with different GPUs in our fleet, and are also looking into various options how to efficiently provision them: as bare metal, as vGPUs, using MIG support, ... and we have submitted a presentation proposal for the upcoming summit to share our experiences.
If you have very specific questions, we can try to answer them here, but maybe there is interest and it would be more efficient to organize a session/call (e.g. as part of the Openstack Operators activities or the Scientific SIG?) to exchange experiences on GPU integration and answer questions there?
What do you and others think?
Cheers, Arne
________________________________________ From: Oliver Weinmann <oliver.weinmann@me.com> Sent: Thursday, 12 January 2023 07:56 To: openstack-discuss Subject: Experience with VGPUs
Dear All,
we are planning to have a POC on VGPUs in our Openstack cluster. Therefore I have a few questions and generally wanted to ask how well VGPUs are supported in Openstack. The docs, in particular:
https://docs.openstack.org/nova/zed/admin/virtual-gpu.html
explain quite well the general implementation.
But I am more interested in general experience with using VGPUs in Openstack. We currently have a small YOGA cluster, planning to upgrade to Zed soon, with a couple of compute nodes. Currently our users use consumer cards like RTX 3050/3060 on their laptops and the idea would be to provide VGPUs to these users. For this I would like to make a very small POC where we first equip one compute node with an Nvidia GPU. Gladly also a few tips on which card would be a good starting point are highly appreciated. I know this heavily depends on the server hardware but this is something I can figure out later. Also do we need additional software licenses > to run this? I saw this very nice presentation from CERN on VGPUs:
https://indico.cern.ch/event/776411/contributions/3345183/attachments/185162...
In the table they are listing Quadro vDWS licenses. I assume we need these in order to use the cards? Also do we need something like Cyborg for this or is VGPU fully implemented in Nova?
You can try to use Cyborg manage your GPU devices, it also can support list/attach vGPU for an instance, if you want to attach/detach an device from an instance that you should transform your flavor, because the vGPU/GPU info need to be added in flavor now(If you want to use this feature may be need to separate such GPU metadata from flavor, we have discussed in nova team before). I am working in Inspur, in our InCloud OS conduct, we are using Cyborg manage GPU/vGPU, FPGA, QAT etc. devices. And adapted GPU T4/T100 (support vGPU), A100(support mig), I think use Cyborg to better manage local GPU devices, please refer api docs of Cyborg https://docs.openstack.org/api-ref/accelerator/
Best Regards,
Oliver
Hi Oliver, I had some experience on using Nvidia vGPUs (Tesla P4) in my own OpenStack cluster. The setup is pretty simple, follow the guides from Nvidia to install Linux KVM drivers[1] and OpenStack document[2] for attaching vGPU mdevs to your instances. Licensing is at the client (VM) side and not the server (hypervisor) side. The cards that you mentioned you are using (RTX3050/3060) doesn't support vGPU, there is a list of supported cards listed by Nvidia[3]. For newer cards using MIGs I have no experience but I would expect the overall procedure to be similar. As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them. Regards, Gene Kuo [1] https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#red-hat-el... [2] https://docs.openstack.org/nova/latest/admin/virtual-gpu.html [3] https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html ------- Original Message ------- On Friday, January 13th, 2023 at 12:03 PM, Brin Zhang <zhangbailin@inspur.com> wrote:
-> ----邮件原件-----
发件人: Arne Wiebalck [mailto:Arne.Wiebalck@cern.ch] 发送时间: 2023年1月12日 15:43 收件人: Oliver Weinmann oliver.weinmann@me.com; openstack-discuss openstack-discuss@lists.openstack.org 主题: Re: Experience with VGPUs
Hi Oliver,
The presentation you linked was only at CERN, not from CERN (it was during an OpenStack Day we organised here). Sylvain and/or Mohammed may be available to answer the questions you have related to that deck, or also in general for the integration of GPUs.
Now, at CERN we also have hypervisors with different GPUs in our fleet, and are also looking into various options how to efficiently provision them: as bare metal, as vGPUs, using MIG support, ... and we have submitted a presentation proposal for the upcoming summit to share our experiences.
If you have very specific questions, we can try to answer them here, but maybe there is interest and it would be more efficient to organize a session/call (e.g. as part of the Openstack Operators activities or the Scientific SIG?) to exchange experiences on GPU integration and answer questions there?
What do you and others think?
Cheers, Arne
________________________________________ From: Oliver Weinmann oliver.weinmann@me.com Sent: Thursday, 12 January 2023 07:56 To: openstack-discuss Subject: Experience with VGPUs
Dear All,
we are planning to have a POC on VGPUs in our Openstack cluster. Therefore I have a few questions and generally wanted to ask how well VGPUs are supported in Openstack. The docs, in particular:
https://docs.openstack.org/nova/zed/admin/virtual-gpu.html
explain quite well the general implementation.
But I am more interested in general experience with using VGPUs in Openstack. We currently have a small YOGA cluster, planning to upgrade to Zed soon, with a couple of compute nodes. Currently our users use consumer cards like RTX 3050/3060 on their laptops and the idea would be to provide VGPUs to these users. For this I would like to make a very small POC where we first equip one compute node with an Nvidia GPU. Gladly also a few tips on which card would be a good starting point are highly appreciated. I know this heavily depends on the server hardware but this is something I can figure out later. Also do we need additional software licenses > to run this? I saw this very nice presentation from CERN on VGPUs:
https://indico.cern.ch/event/776411/contributions/3345183/attachments/185162...
In the table they are listing Quadro vDWS licenses. I assume we need these in order to use the cards? Also do we need something like Cyborg for this or is VGPU fully implemented in Nova?
You can try to use Cyborg manage your GPU devices, it also can support list/attach vGPU for an instance, if you want to attach/detach an device from an instance that you should transform your flavor, because the vGPU/GPU info need to be added in flavor now(If you want to use this feature may be need to separate such GPU metadata from flavor, we have discussed in nova team before). I am working in Inspur, in our InCloud OS conduct, we are using Cyborg manage GPU/vGPU, FPGA, QAT etc. devices. And adapted GPU T4/T100 (support vGPU), A100(support mig), I think use Cyborg to better manage local GPU devices, please refer api docs of Cyborg https://docs.openstack.org/api-ref/accelerator/
Best Regards,
Oliver
Hi Oliver,
I had some experience on using Nvidia vGPUs (Tesla P4) in my own OpenStack cluster. The setup is pretty simple, follow the guides from Nvidia to install Linux KVM drivers[1] and OpenStack document[2] for attaching vGPU mdevs to your instances. Licensing is at the client (VM) side and not the server (hypervisor) side. The cards that you mentioned you are using (RTX3050/3060) doesn't support vGPU, there is a list of supported cards listed by Nvidia[3].
For newer cards using MIGs I have no experience but I would expect the overall procedure to be similar.
On Fri, 2023-01-13 at 13:07 +0000, Gene Kuo wrote: the main differnce for mig mode is that the mdevs are created ontop of sriov VFs so from a nova prespective instead of listing the adress of the PF you need to enable the VFs instead in the config. its more or less the same other then that on the nova side. obvioulsy there is alittle more work to cofnigr the VFs ectra on the host for mig mode but its mostly transparent to nova all that changes is which pci device (the PF or VF) provides the inventories of mdevs which nova will attach to the vm. in the MIG case each vf expose at most 1 mdev instance of a specific type with out mig the pf expose multiple instance of a singel mdev type.
As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
ya so on the amd side if you happen ot have those drivers then instead of using nova vGPU feature you jsut use normal pci passhtough pci passthough in nova contery to what some assume was not orgianly added for sriov networking. it was added for intel QAT device and supprots PFs, VFs and non sriov capable pcie devices. as long as the device is stateles you can use it with the generic pci passthough supprot via the pci alais in the instance flavor. so if you have the drvier you just need to create a pci alias for the amd gpu vfs and use them like any other accelorator that support sriov.
Regards, Gene Kuo
[1] https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#red-hat-el... [2] https://docs.openstack.org/nova/latest/admin/virtual-gpu.html [3] https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html
------- Original Message ------- On Friday, January 13th, 2023 at 12:03 PM, Brin Zhang <zhangbailin@inspur.com> wrote:
-> ----邮件原件-----
发件人: Arne Wiebalck [mailto:Arne.Wiebalck@cern.ch] 发送时间: 2023年1月12日 15:43 收件人: Oliver Weinmann oliver.weinmann@me.com; openstack-discuss openstack-discuss@lists.openstack.org 主题: Re: Experience with VGPUs
Hi Oliver,
The presentation you linked was only at CERN, not from CERN (it was during an OpenStack Day we organised here). Sylvain and/or Mohammed may be available to answer the questions you have related to that deck, or also in general for the integration of GPUs.
Now, at CERN we also have hypervisors with different GPUs in our fleet, and are also looking into various options how to efficiently provision them: as bare metal, as vGPUs, using MIG support, ... and we have submitted a presentation proposal for the upcoming summit to share our experiences.
If you have very specific questions, we can try to answer them here, but maybe there is interest and it would be more efficient to organize a session/call (e.g. as part of the Openstack Operators activities or the Scientific SIG?) to exchange experiences on GPU integration and answer questions there?
What do you and others think?
Cheers, Arne
________________________________________ From: Oliver Weinmann oliver.weinmann@me.com Sent: Thursday, 12 January 2023 07:56 To: openstack-discuss Subject: Experience with VGPUs
Dear All,
we are planning to have a POC on VGPUs in our Openstack cluster. Therefore I have a few questions and generally wanted to ask how well VGPUs are supported in Openstack. The docs, in particular:
https://docs.openstack.org/nova/zed/admin/virtual-gpu.html
explain quite well the general implementation.
But I am more interested in general experience with using VGPUs in Openstack. We currently have a small YOGA cluster, planning to upgrade to Zed soon, with a couple of compute nodes. Currently our users use consumer cards like RTX 3050/3060 on their laptops and the idea would be to provide VGPUs to these users. For this I would like to make a very small POC where we first equip one compute node with an Nvidia GPU. Gladly also a few tips on which card would be a good starting point are highly appreciated. I know this heavily depends on the server hardware but this is something I can figure out later. Also do we need additional software licenses > to run this? I saw this very nice presentation from CERN on VGPUs:
https://indico.cern.ch/event/776411/contributions/3345183/attachments/185162...
In the table they are listing Quadro vDWS licenses. I assume we need these in order to use the cards? Also do we need something like Cyborg for this or is VGPU fully implemented in Nova?
You can try to use Cyborg manage your GPU devices, it also can support list/attach vGPU for an instance, if you want to attach/detach an device from an instance that you should transform your flavor, because the vGPU/GPU info need to be added in flavor now(If you want to use this feature may be need to separate such GPU metadata from flavor, we have discussed in nova team before). I am working in Inspur, in our InCloud OS conduct, we are using Cyborg manage GPU/vGPU, FPGA, QAT etc. devices. And adapted GPU T4/T100 (support vGPU), A100(support mig), I think use Cyborg to better manage local GPU devices, please refer api docs of Cyborg https://docs.openstack.org/api-ref/accelerator/
Best Regards,
Oliver
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either). It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users. [1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
Hi Everyone,thanks for the many replies and hints. I think I will go for an NVIDIA T4 for now and try to get it working in our OpenStack cluster by following your guidelines @Gene. I will report back on the progress.Cheers,OliverOn Jan 13, 2023, at 4:20 PM, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote:You are saying that, like Nvidia GRID drivers are open-sourced whilein fact they're super far from being that. In order to downloaddrivers not only for hypervisors, but also for guest VMs you need tohave an account in their Enterprise Portal. It took me roughly 6 weeksof discussions with hardware vendors and Nvidia support to get aproper account there. And that happened only after applying for theirPartner Network (NPN).That still doesn't solve the issue of how to provide drivers toguests, except pre-build a series of images with these driverspre-installed (we ended up with making a DIB element for that [1]).Not saying about the need to distribute license tokens for guests andthe whole mess with compatibility between hypervisor and guest drivers(as guest driver can't be newer then host one, and HVs can't be toonew either).It's not that I'm protecting AMD, but just saying that Nvidia is notthat straightforward either, and at least on paper AMD vGPUs lookeasier both for operators and end-users.[1] https://github.com/citynetwork/dib-elements/tree/main/nvgridAs for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but: - respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors. Cheers On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote:
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either).
It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users.
[1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card supports
SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh. пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov < noonedeadpunk@gmail.com> wrote:
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either).
It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users.
[1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card
supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
But despite all my rant - it's all related to the Nvidia part of things, not openstack. Support of GPUs and vGPUs is fair enough and nova folks do their best to support that hardware. пт, 13 янв. 2023 г., 21:06 Dmitriy Rabotyagov <noonedeadpunk@gmail.com>:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov < noonedeadpunk@gmail.com> wrote:
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either).
It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users.
[1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card
supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
Hi, all, just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens). Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this. Kind regards, Ulrich On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote: > > You are saying that, like Nvidia GRID drivers are open-sourced while > in fact they're super far from being that. In order to download > drivers not only for hypervisors, but also for guest VMs you need to > have an account in their Enterprise Portal. It took me roughly 6 weeks > of discussions with hardware vendors and Nvidia support to get a > proper account there. And that happened only after applying for their > Partner Network (NPN). > That still doesn't solve the issue of how to provide drivers to > guests, except pre-build a series of images with these drivers > pre-installed (we ended up with making a DIB element for that [1]). > Not saying about the need to distribute license tokens for guests and > the whole mess with compatibility between hypervisor and guest drivers > (as guest driver can't be newer then host one, and HVs can't be too > new either). > > It's not that I'm protecting AMD, but just saying that Nvidia is not > that straightforward either, and at least on paper AMD vGPUs look > easier both for operators and end-users. > > [1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid > > > > > As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them. > > >
On Mon, 2023-01-16 at 11:38 +0100, Ulrich Schwickerath wrote:
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
so those two usecasue can kind of be fulfilled in yoga. in yoga we finally merged supprot for unified limits via keystone https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/unif... this allow yout to create quotas/limits on any reslouce class. that is our intended way for you to set limits on GPU resources by leveraging the generic mdev support in xena to map differnt mdev types to differnt resouce classes. https://specs.openstack.org/openstack/nova-specs/specs/xena/implemented/gene... you can also use the provider confugration files https://specs.openstack.org/openstack/nova-specs/specs/victoria/implemented/... to simplfy adding traits to the gpu resouces in a declaritive way to enabel better schduling for example adding traits for the CUDA version supported by a given vGPU on a host. so you coudl do something like this assuming you have 2 gpus types Alice and Bob Alice support CUDA 3 and has a small amount of vram (i.e. you older generate of gpus) Bob is the new kid on the block with CUDA 9000 support and all the vram you could ask for ( the latest and greates GPU) using the nova geneic mdev feature you can map the Alice GPUS to CUSTOM_VGPU_ALICE and BOB to CUSTOM_VGPU_BOB and using unifed limits you can set a limit/quota of 10 CUSTOM_VGPU_ALICE reoscues and 1 CUSTOM_VGPU_BOB resouces on a given project using provider.yaml you can tag the Alice gpus with CUSTOM_CUDA_3 and the BOB gpus with CUSTOM_CUDA_9000 in the useing flavors you can create flavor defintion that request the diferent GPU types using resouce:CUSTOM_VGPU_ALICE=1 but if you want to prevent images that need CUDA 9000 form being schduled using the ALICE GPU simply add traits:CUSTOM_CUDA_9000 to the image. so if you have yoga you have all of the above features avaiabel. xena does nto give you the quota enforcement but youc and do all the schduling bits provider.yaml is entirly optionalbut that has been aournd the longest. some of this would also just work for cyborg since it shoudl be using custom resocue classes to model the gpus in placment already. we started adding geneic pci devices to placemnt in zed and we are completeing it this cycle https://specs.openstack.org/openstack/nova-specs/specs/2023.1/approved/pci-d... so the same unified limits appoch will work for pci passthoguh going forward too. hopefully this helps you meet those usecasues. we dont really have any good produciton example of peopel combining all of the above featues so if you do use them as descibed feedback is welcome. we designed these features to all work together in this way but since they are relitivly new addtions we suspect may operators have not used them yet or know about there existance.
Kind regards,
Ulrich
On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote: > > You are saying that, like Nvidia GRID drivers are open-sourced while > in fact they're super far from being that. In order to download > drivers not only for hypervisors, but also for guest VMs you need to > have an account in their Enterprise Portal. It took me roughly 6 weeks > of discussions with hardware vendors and Nvidia support to get a > proper account there. And that happened only after applying for their > Partner Network (NPN). > That still doesn't solve the issue of how to provide drivers to > guests, except pre-build a series of images with these drivers > pre-installed (we ended up with making a DIB element for that [1]). > Not saying about the need to distribute license tokens for guests and > the whole mess with compatibility between hypervisor and guest drivers > (as guest driver can't be newer then host one, and HVs can't be too > new either). > > It's not that I'm protecting AMD, but just saying that Nvidia is not > that straightforward either, and at least on paper AMD vGPUs look > easier both for operators and end-users. > > [1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid > > > > > As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them. > > >
On 16/01/2023 11:33, Sean Mooney wrote:
you can also use the provider confugration files https://specs.openstack.org/openstack/nova-specs/specs/victoria/implemented/... to simplfy adding traits to the gpu resouces in a declaritive way to enabel better schduling for example adding traits for the CUDA version supported by a given vGPU on a host.
Very interesting - I started to look at some ansible to deploy these provider config files. There is note at the end of the doc saying "it is recommended to use the schema provided by nova to validate the config using a simple jsonschema validator" - the natural place to do this with ansible would be here https://docs.ansible.com/ansible/latest/collections/ansible/builtin/copy_mod... but I can't find a way to do that on a YAML file with a jsonschema CLI one-liner. What would the right way to validate the yaml with the ansible copy module? Thanks, Jon.
Hi, Ulrich: Sean is expert on VGPU management from nova side. I complete the usage steps if you are using Nova to manage MIGs for example: 1. divide the A100(80G) GPUs to 1g.10gb*1+2g.20gb*1+3g.40gb*1(one 1g.10gb, one 2g.20gb and one 3g.40gb) 2.add the device config in nova.conf: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701 [mdev_nvidia-699] device_addresses = 0000:84:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3 3.config the flavor metadata with VGPU:1 and create vm use the flavor, the vm will randomly allocate one MIG from [1g.10gb,2g,20gb,3g.40gb] On step 2, if you have 2 A100(80G) GPUs on one node to use MIG, and the other GPU divide to 1g.10gb*3+4g.40gb*1, the config maybe like this: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701,nvidia-702 [mdev_nvidia-699] device_addresses = 0000:84:00.1, 0000:3b:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3, [mdev_nvidia-702] device_addresses = 0000:3b:00.3 In our product, we use Cyborg to manage the MIGs, from the legacy style we also need config the mig like Nova, this is difficult to maintain, especially deploy openstack on k8s, so we remove these config and automatically discovery the MIGs and support divide MIG by cyborg api. By creating device profile with vgpu type traits(nvidia-699, nvidia-700), we can appoint MIG size to create VMs. Kind regards -----邮件原件----- 发件人: Sean Mooney [mailto:smooney@redhat.com] 发送时间: 2023年1月16日 19:33 收件人: Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch>; openstack-discuss@lists.openstack.org 主题: Re: 答复: Experience with VGPUs On Mon, 2023-01-16 at 11:38 +0100, Ulrich Schwickerath wrote:
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
so those two usecasue can kind of be fulfilled in yoga. in yoga we finally merged supprot for unified limits via keystone https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/unif... this allow yout to create quotas/limits on any reslouce class. that is our intended way for you to set limits on GPU resources by leveraging the generic mdev support in xena to map differnt mdev types to differnt resouce classes. https://specs.openstack.org/openstack/nova-specs/specs/xena/implemented/gene... you can also use the provider confugration files https://specs.openstack.org/openstack/nova-specs/specs/victoria/implemented/... to simplfy adding traits to the gpu resouces in a declaritive way to enabel better schduling for example adding traits for the CUDA version supported by a given vGPU on a host. so you coudl do something like this assuming you have 2 gpus types Alice and Bob Alice support CUDA 3 and has a small amount of vram (i.e. you older generate of gpus) Bob is the new kid on the block with CUDA 9000 support and all the vram you could ask for ( the latest and greates GPU) using the nova geneic mdev feature you can map the Alice GPUS to CUSTOM_VGPU_ALICE and BOB to CUSTOM_VGPU_BOB and using unifed limits you can set a limit/quota of 10 CUSTOM_VGPU_ALICE reoscues and 1 CUSTOM_VGPU_BOB resouces on a given project using provider.yaml you can tag the Alice gpus with CUSTOM_CUDA_3 and the BOB gpus with CUSTOM_CUDA_9000 in the useing flavors you can create flavor defintion that request the diferent GPU types using resouce:CUSTOM_VGPU_ALICE=1 but if you want to prevent images that need CUDA 9000 form being schduled using the ALICE GPU simply add traits:CUSTOM_CUDA_9000 to the image. so if you have yoga you have all of the above features avaiabel. xena does nto give you the quota enforcement but youc and do all the schduling bits provider.yaml is entirly optionalbut that has been aournd the longest. some of this would also just work for cyborg since it shoudl be using custom resocue classes to model the gpus in placment already. we started adding geneic pci devices to placemnt in zed and we are completeing it this cycle https://specs.openstack.org/openstack/nova-specs/specs/2023.1/approved/pci-d... so the same unified limits appoch will work for pci passthoguh going forward too. hopefully this helps you meet those usecasues. we dont really have any good produciton example of peopel combining all of the above featues so if you do use them as descibed feedback is welcome. we designed these features to all work together in this way but since they are relitivly new addtions we suspect may operators have not used them yet or know about there existance.
Kind regards,
Ulrich
On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote: > > You are saying that, like Nvidia GRID drivers are open-sourced while > in fact they're super far from being that. In order to download > drivers not only for hypervisors, but also for guest VMs you need to > have an account in their Enterprise Portal. It took me roughly 6 weeks > of discussions with hardware vendors and Nvidia support to get a > proper account there. And that happened only after applying for their > Partner Network (NPN). > That still doesn't solve the issue of how to provide drivers to > guests, except pre-build a series of images with these drivers > pre-installed (we ended up with making a DIB element for that [1]). > Not saying about the need to distribute license tokens for guests and > the whole mess with compatibility between hypervisor and guest drivers > (as guest driver can't be newer then host one, and HVs can't be too > new either). > > It's not that I'm protecting AMD, but just saying that Nvidia is not > that straightforward either, and at least on paper AMD vGPUs look > easier both for operators and end-users. > > [1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid > > > > > As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them. > > >
Oh, wait a second, can you have multiple different types on 1 GPU? As I don't think you can, or maybe it's limited to MIG mode only - I'm using mostly vGPUs so not 100% sure about MIG mode. But eventually on vGPU, once you create 1 type, all others become unavailable. So originally each comand like # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 1 BUT, once you create an mdev of specific type, rest will not report as available anymore. # echo ${uuidgen} > /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/create # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 0 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 0 Please, correct me if I'm wrong here and Nvidia did some changes with recent drivers or it's applicable only for vGPUs and it's not a case for the MIG mode. вт, 17 янв. 2023 г., 03:37 Alex Song (宋文平) <songwenping@inspur.com>:
Hi, Ulrich:
Sean is expert on VGPU management from nova side. I complete the usage steps if you are using Nova to manage MIGs for example: 1. divide the A100(80G) GPUs to 1g.10gb*1+2g.20gb*1+3g.40gb*1(one 1g.10gb, one 2g.20gb and one 3g.40gb) 2.add the device config in nova.conf: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701 [mdev_nvidia-699] device_addresses = 0000:84:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3 3.config the flavor metadata with VGPU:1 and create vm use the flavor, the vm will randomly allocate one MIG from [1g.10gb,2g,20gb,3g.40gb] On step 2, if you have 2 A100(80G) GPUs on one node to use MIG, and the other GPU divide to 1g.10gb*3+4g.40gb*1, the config maybe like this: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701,nvidia-702 [mdev_nvidia-699] device_addresses = 0000:84:00.1, 0000:3b:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3, [mdev_nvidia-702] device_addresses = 0000:3b:00.3
In our product, we use Cyborg to manage the MIGs, from the legacy style we also need config the mig like Nova, this is difficult to maintain, especially deploy openstack on k8s, so we remove these config and automatically discovery the MIGs and support divide MIG by cyborg api. By creating device profile with vgpu type traits(nvidia-699, nvidia-700), we can appoint MIG size to create VMs.
Kind regards
Le mar. 17 janv. 2023 à 12:22, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> a écrit :
Oh, wait a second, can you have multiple different types on 1 GPU? As I don't think you can, or maybe it's limited to MIG mode only - I'm using mostly vGPUs so not 100% sure about MIG mode. But eventually on vGPU, once you create 1 type, all others become unavailable. So originally each comand like # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 1
BUT, once you create an mdev of specific type, rest will not report as available anymore. # echo ${uuidgen} > /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/create # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 0 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 0
Please, correct me if I'm wrong here and Nvidia did some changes with recent drivers or it's applicable only for vGPUs and it's not a case for the MIG mode.
No, you're unfortunately right. https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#valid-vg... For time-slices vGPUs, you need to use the same type for one pGPU. Of course, if a card has multiple pGPUs, you can have multiple types, one per PCI ID. Technically, nvidia says you need to use the same framebuffer size, but that eventually means the same. For MIG-backed vGPUs, surely you can mix types after creating MIG instances. -S вт, 17 янв. 2023 г., 03:37 Alex Song (宋文平) <songwenping@inspur.com>:
Hi, Ulrich:
Sean is expert on VGPU management from nova side. I complete the usage
1. divide the A100(80G) GPUs to 1g.10gb*1+2g.20gb*1+3g.40gb*1(one 1g.10gb, one 2g.20gb and one 3g.40gb) 2.add the device config in nova.conf: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701 [mdev_nvidia-699] device_addresses = 0000:84:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3 3.config the flavor metadata with VGPU:1 and create vm use the flavor,
steps if you are using Nova to manage MIGs for example: the vm will randomly allocate one MIG from [1g.10gb,2g,20gb,3g.40gb]
On step 2, if you have 2 A100(80G) GPUs on one node to use MIG, and the other GPU divide to 1g.10gb*3+4g.40gb*1, the config maybe like this: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701,nvidia-702 [mdev_nvidia-699] device_addresses = 0000:84:00.1, 0000:3b:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3, [mdev_nvidia-702] device_addresses = 0000:3b:00.3
In our product, we use Cyborg to manage the MIGs, from the legacy style we also need config the mig like Nova, this is difficult to maintain, especially deploy openstack on k8s, so we remove these config and automatically discovery the MIGs and support divide MIG by cyborg api. By creating device profile with vgpu type traits(nvidia-699, nvidia-700), we can appoint MIG size to create VMs.
Kind regards
MIG allows for a limited variation of instance types on the same card unlike vGPU which requires a heterogenous implementation. see https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-profiles for more details. ________________________________ From: Dmitriy Rabotyagov <noonedeadpunk@gmail.com> Sent: 17 January 2023 11:16 Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: 答复: Experience with VGPUs CAUTION: This email originates from outside THG Oh, wait a second, can you have multiple different types on 1 GPU? As I don't think you can, or maybe it's limited to MIG mode only - I'm using mostly vGPUs so not 100% sure about MIG mode. But eventually on vGPU, once you create 1 type, all others become unavailable. So originally each comand like # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 1 BUT, once you create an mdev of specific type, rest will not report as available anymore. # echo ${uuidgen} > /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/create # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 0 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 0 Please, correct me if I'm wrong here and Nvidia did some changes with recent drivers or it's applicable only for vGPUs and it's not a case for the MIG mode. вт, 17 янв. 2023 г., 03:37 Alex Song (宋文平) <songwenping@inspur.com>:
Hi, Ulrich:
Sean is expert on VGPU management from nova side. I complete the usage steps if you are using Nova to manage MIGs for example: 1. divide the A100(80G) GPUs to 1g.10gb*1+2g.20gb*1+3g.40gb*1(one 1g.10gb, one 2g.20gb and one 3g.40gb) 2.add the device config in nova.conf: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701 [mdev_nvidia-699] device_addresses = 0000:84:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3 3.config the flavor metadata with VGPU:1 and create vm use the flavor, the vm will randomly allocate one MIG from [1g.10gb,2g,20gb,3g.40gb] On step 2, if you have 2 A100(80G) GPUs on one node to use MIG, and the other GPU divide to 1g.10gb*3+4g.40gb*1, the config maybe like this: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701,nvidia-702 [mdev_nvidia-699] device_addresses = 0000:84:00.1, 0000:3b:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3, [mdev_nvidia-702] device_addresses = 0000:3b:00.3
In our product, we use Cyborg to manage the MIGs, from the legacy style we also need config the mig like Nova, this is difficult to maintain, especially deploy openstack on k8s, so we remove these config and automatically discovery the MIGs and support divide MIG by cyborg api. By creating device profile with vgpu type traits(nvidia-699, nvidia-700), we can appoint MIG size to create VMs.
Kind regards
Danny Webb Principal OpenStack Engineer Danny.Webb@thehutgroup.com [THG Ingenuity Logo] www.thg.com<https://www.thg.com> [https://i.imgur.com/wbpVRW6.png]<https://www.linkedin.com/company/thg-ingenuity/?originalSubdomain=uk> [https://i.imgur.com/c3040tr.png] <https://twitter.com/thgingenuity?lang=en>
sorry, meant to say vGPU requires a homogeneous implementation. ________________________________ From: Danny Webb <Danny.Webb@thehutgroup.com> Sent: 17 January 2023 11:50 To: Dmitriy Rabotyagov <noonedeadpunk@gmail.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: 答复: Experience with VGPUs MIG allows for a limited variation of instance types on the same card unlike vGPU which requires a heterogenous implementation. see https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-profiles for more details. ________________________________ From: Dmitriy Rabotyagov <noonedeadpunk@gmail.com> Sent: 17 January 2023 11:16 Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: 答复: Experience with VGPUs CAUTION: This email originates from outside THG Oh, wait a second, can you have multiple different types on 1 GPU? As I don't think you can, or maybe it's limited to MIG mode only - I'm using mostly vGPUs so not 100% sure about MIG mode. But eventually on vGPU, once you create 1 type, all others become unavailable. So originally each comand like # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 1 BUT, once you create an mdev of specific type, rest will not report as available anymore. # echo ${uuidgen} > /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/create # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 0 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 0 Please, correct me if I'm wrong here and Nvidia did some changes with recent drivers or it's applicable only for vGPUs and it's not a case for the MIG mode. вт, 17 янв. 2023 г., 03:37 Alex Song (宋文平) <songwenping@inspur.com>:
Hi, Ulrich:
Sean is expert on VGPU management from nova side. I complete the usage steps if you are using Nova to manage MIGs for example: 1. divide the A100(80G) GPUs to 1g.10gb*1+2g.20gb*1+3g.40gb*1(one 1g.10gb, one 2g.20gb and one 3g.40gb) 2.add the device config in nova.conf: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701 [mdev_nvidia-699] device_addresses = 0000:84:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3 3.config the flavor metadata with VGPU:1 and create vm use the flavor, the vm will randomly allocate one MIG from [1g.10gb,2g,20gb,3g.40gb] On step 2, if you have 2 A100(80G) GPUs on one node to use MIG, and the other GPU divide to 1g.10gb*3+4g.40gb*1, the config maybe like this: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701,nvidia-702 [mdev_nvidia-699] device_addresses = 0000:84:00.1, 0000:3b:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3, [mdev_nvidia-702] device_addresses = 0000:3b:00.3
In our product, we use Cyborg to manage the MIGs, from the legacy style we also need config the mig like Nova, this is difficult to maintain, especially deploy openstack on k8s, so we remove these config and automatically discovery the MIGs and support divide MIG by cyborg api. By creating device profile with vgpu type traits(nvidia-699, nvidia-700), we can appoint MIG size to create VMs.
Kind regards
Danny Webb Principal OpenStack Engineer Danny.Webb@thehutgroup.com [THG Ingenuity Logo] www.thg.com<https://www.thg.com> [https://i.imgur.com/wbpVRW6.png]<https://www.linkedin.com/company/thg-ingenuity/?originalSubdomain=uk> [https://i.imgur.com/c3040tr.png] <https://twitter.com/thgingenuity?lang=en>
On Tue, Jan 17, 2023 at 4:54 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote:
Oh, wait a second, can you have multiple different types on 1 GPU? As I don't think you can, or maybe it's limited to MIG mode only - I'm using mostly vGPUs so not 100% sure about MIG mode. But eventually on vGPU, once you create 1 type, all others become unavailable. So originally each comand like # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 1
BUT, once you create an mdev of specific type, rest will not report as available anymore. # echo ${uuidgen} > /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/create # cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances 0 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances 1 # cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances 0
Please, correct me if I'm wrong here and Nvidia did some changes with recent drivers or it's applicable only for vGPUs and it's not a case for the MIG mode.
I have created A40-24Q instance out of A40 48GB GPU. But I experience the same.
вт, 17 янв. 2023 г., 03:37 Alex Song (宋文平) <songwenping@inspur.com>:
Hi, Ulrich:
Sean is expert on VGPU management from nova side. I complete the usage
1. divide the A100(80G) GPUs to 1g.10gb*1+2g.20gb*1+3g.40gb*1(one 1g.10gb, one 2g.20gb and one 3g.40gb) 2.add the device config in nova.conf: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701 [mdev_nvidia-699] device_addresses = 0000:84:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3 3.config the flavor metadata with VGPU:1 and create vm use the flavor,
steps if you are using Nova to manage MIGs for example: the vm will randomly allocate one MIG from [1g.10gb,2g,20gb,3g.40gb]
On step 2, if you have 2 A100(80G) GPUs on one node to use MIG, and the other GPU divide to 1g.10gb*3+4g.40gb*1, the config maybe like this: [devices] enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701,nvidia-702 [mdev_nvidia-699] device_addresses = 0000:84:00.1, 0000:3b:00.1 [mdev_nvidia-700] device_addresses = 0000:84:00.2 [mdev_nvidia-701] device_addresses = 0000:84:00.3, [mdev_nvidia-702] device_addresses = 0000:3b:00.3
In our product, we use Cyborg to manage the MIGs, from the legacy style we also need config the mig like Nova, this is difficult to maintain, especially deploy openstack on k8s, so we remove these config and automatically discovery the MIGs and support divide MIG by cyborg api. By creating device profile with vgpu type traits(nvidia-699, nvidia-700), we can appoint MIG size to create VMs.
Kind regards
Hello, We are using vGPUs with Nova on OpenStack Xena release and we’ve had a fairly good experience integration NVIDIA A10 GPUs into our cloud. As we see it there is some painpoints that just goes with mantaining the GPU feature. - There is a very tight coupling of the NVIDIA driver in the guest (instance) and on the compute node that needs to be managed. - Doing maintainance need more planning i.e powering off instances, NVIDIA driver on compute node needs to be rebuilt on hypervisor if kernel is upgraded unless you’ve implemented DKMS for that. - Because we’ve different flavor of GPU (we split the A10 cards into different flavors for maximum utilization of other compute resources) we added custom traits in the Placement service to handle that, handling that with a script since doing anything manually related to GPUs you will get confused quickly. [1] - Since Nova does not handle recreation of mdevs (or use the new libvirt autostart feature for mdevs) we have a systemd unit that executes before the nova-compute service that walks all the libvirt domains and does lookups in Placement to recreate the mdevs before nova-compute start. [2] [3] [4] Best regards Tobias DISCLAIMER: Below is provided without any warranty of actually working for you or your setup and does very specific things that we need and is only provided to give you some insight and help. Use at your own risk. [1] https://paste.opendev.org/show/b6FdfwDHnyJXR0G3XarE/ [2] https://paste.opendev.org/show/bGtO6aIE519uysvytWv0/ [3] https://paste.opendev.org/show/bftOEIPxlpLptkosxlL6/ [4] https://paste.opendev.org/show/bOYBV6lhRON4ntQKYPkb/
Le mar. 17 janv. 2023 à 10:00, Tobias Urdin <tobias.urdin@binero.com> a écrit :
Hello,
We are using vGPUs with Nova on OpenStack Xena release and we’ve had a fairly good experience integration NVIDIA A10 GPUs into our cloud.
Great to hear, thanks for your feedback, much appreciated Tobias.
As we see it there is some painpoints that just goes with mantaining the GPU feature.
- There is a very tight coupling of the NVIDIA driver in the guest (instance) and on the compute node that needs to be managed.
As nvidia provides proprietary drivers, there isn't much we can move on upstream, even for CI testing. Many participants in this thread explained this as a common concern and I understand their pain, but yeah you need third-party tooling for managing both the driver installation and the licensing servers.
- Doing maintainance need more planning i.e powering off instances, NVIDIA driver on compute node needs to be rebuilt on hypervisor if kernel is upgraded unless you’ve implemented DKMS for that.
Ditto, unfortunately I wish the driver could be less kernel-dependent but I don't see a foreseenable future for this.
- Because we’ve different flavor of GPU (we split the A10 cards into different flavors for maximum utilization of other compute resources) we added custom traits in the Placement service to handle that, handling that with a script since doing anything manually related to GPUs you will get confused quickly. [1]
True, that's why you can also use generic mdevs which will create different resource classes (but ssssht) or use the placement.yaml file to manage your inventories. https://specs.openstack.org/openstack/nova-specs/specs/xena/implemented/gene...
- Since Nova does not handle recreation of mdevs (or use the new libvirt autostart feature for mdevs) we have a systemd unit that executes before the nova-compute service that walks all the libvirt domains and does lookups in Placement to recreate the mdevs before nova-compute start. [2] [3] [4]
This is a known issue and we agreed on the last PTG for a direction. Patches on review. https://review.opendev.org/c/openstack/nova/+/864418 Thanks, -Sylvain
Best regards Tobias
DISCLAIMER: Below is provided without any warranty of actually working for you or your setup and does very specific things that we need and is only provided to give you some insight and help. Use at your own risk.
[1] https://paste.opendev.org/show/b6FdfwDHnyJXR0G3XarE/ [2] https://paste.opendev.org/show/bGtO6aIE519uysvytWv0/ [3] https://paste.opendev.org/show/bftOEIPxlpLptkosxlL6/ [4] https://paste.opendev.org/show/bOYBV6lhRON4ntQKYPkb/
Hi Tobias, Thankyou for posting the scripts to recreate the mdev, those are very useful and have worked OK in our environment. Managing the race conditions between the nvidia gpu manager starting, re-creating the mdev and holding off nova-compute starting until that is all complete seems quite tricky. I see the comments on https://review.opendev.org/c/openstack/nova/+/864418 and I'm also interested to know how the ordering between udev rules execution and the nvidia driver being sufficiently initialised to create mdev can be expressed. Thanks again for the scripts, Jonathan. On 17/01/2023 08:54, Tobias Urdin wrote:
Hello,
We are using vGPUs with Nova on OpenStack Xena release and we’ve had a fairly good experience integration NVIDIA A10 GPUs into our cloud.
As we see it there is some painpoints that just goes with mantaining the GPU feature.
- There is a very tight coupling of the NVIDIA driver in the guest (instance) and on the compute node that needs to be managed.
- Doing maintainance need more planning i.e powering off instances, NVIDIA driver on compute node needs to be rebuilt on hypervisor if kernel is upgraded unless you’ve implemented DKMS for that.
- Because we’ve different flavor of GPU (we split the A10 cards into different flavors for maximum utilization of other compute resources) we added custom traits in the Placement service to handle that, handling that with a script since doing anything manually related to GPUs you will get confused quickly. [1]
- Since Nova does not handle recreation of mdevs (or use the new libvirt autostart feature for mdevs) we have a systemd unit that executes before the nova-compute service that walks all the libvirt domains and does lookups in Placement to recreate the mdevs before nova-compute start. [2] [3] [4]
Best regards Tobias
DISCLAIMER: Below is provided without any warranty of actually working for you or your setup and does very specific things that we need and is only provided to give you some insight and help. Use at your own risk.
[1] https://paste.opendev.org/show/b6FdfwDHnyJXR0G3XarE/ [2] https://paste.opendev.org/show/bGtO6aIE519uysvytWv0/ [3] https://paste.opendev.org/show/bftOEIPxlpLptkosxlL6/ [4] https://paste.opendev.org/show/bOYBV6lhRON4ntQKYPkb/
On Mon, 2023-02-06 at 13:15 +0000, Jonathan Rosser wrote:
Hi Tobias,
Thankyou for posting the scripts to recreate the mdev, those are very useful and have worked OK in our environment.
Managing the race conditions between the nvidia gpu manager starting, re-creating the mdev and holding off nova-compute starting until that is all complete seems quite tricky. assuming you are using systemd you can use the before/after statement to orcestrate that.
if you can modify either of the systemd service files just create an entirly new one for the synconisation. make it run After the nvidia-gpu-manager service and before the nova-compute/libvirt serivce and just invoke /bin/true see https://www.freedesktop.org/software/systemd/man/systemd.unit.html for details.
I see the comments on https://review.opendev.org/c/openstack/nova/+/864418 and I'm also interested to know how the ordering between udev rules execution and the nvidia driver being sufficiently initialised to create mdev can be expressed.
in this case its not really required. when the PF device is fully initallised that can be used as the trigger for the udev rule to configure the vfs Alternitivly you can just use systemd service files. in general this is out of scope fo the nova docs and we just want to provide some simpel references that peopl can use but the goal is not to perscibel as specific implemation that will be used. one approch would be to use somting liek https://github.com/NVIDIA/mig-parted instead and just ensure that the nova-compute and libvirt service is not executed until after that has completed this is very much a case of you should follow the recommendation of the manufacutre of the hardware and form an upsterem perspeictive we just want to docuemt soemthign that should work for most people but we dont nessiarly want to fully descipe what to to do.
Thanks again for the scripts, Jonathan.
On 17/01/2023 08:54, Tobias Urdin wrote:
Hello,
We are using vGPUs with Nova on OpenStack Xena release and we’ve had a fairly good experience integration NVIDIA A10 GPUs into our cloud.
As we see it there is some painpoints that just goes with mantaining the GPU feature.
- There is a very tight coupling of the NVIDIA driver in the guest (instance) and on the compute node that needs to be managed.
- Doing maintainance need more planning i.e powering off instances, NVIDIA driver on compute node needs to be rebuilt on hypervisor if kernel is upgraded unless you’ve implemented DKMS for that.
- Because we’ve different flavor of GPU (we split the A10 cards into different flavors for maximum utilization of other compute resources) we added custom traits in the Placement service to handle that, handling that with a script since doing anything manually related to GPUs you will get confused quickly. [1]
- Since Nova does not handle recreation of mdevs (or use the new libvirt autostart feature for mdevs) we have a systemd unit that executes before the nova-compute service that walks all the libvirt domains and does lookups in Placement to recreate the mdevs before nova-compute start. [2] [3] [4]
Best regards Tobias
DISCLAIMER: Below is provided without any warranty of actually working for you or your setup and does very specific things that we need and is only provided to give you some insight and help. Use at your own risk.
[1] https://paste.opendev.org/show/b6FdfwDHnyJXR0G3XarE/ [2] https://paste.opendev.org/show/bGtO6aIE519uysvytWv0/ [3] https://paste.opendev.org/show/bftOEIPxlpLptkosxlL6/ [4] https://paste.opendev.org/show/bOYBV6lhRON4ntQKYPkb/
Hi Ulrich, I believe this is a perfect use case for Cyborg which provides state-of-the-art heterogeneous hardware management and is easy to use. cc: Brin Zhang Thank you Regards Li Liu On Mon, Jan 16, 2023 at 5:39 AM Ulrich Schwickerath < Ulrich.Schwickerath@cern.ch> wrote:
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
Kind regards,
Ulrich On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov < noonedeadpunk@gmail.com> wrote:
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either).
It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users.
[1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card
supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
-- Thank you Regards Li
Hello Ulrich, I am relaunching this discussion as I noticed that you gave a talk about this topic at OpenInfra Summit in Vancouver. Is it possible to share the presentation here? I hope the talks will be uploaded soon in YouTube. We are mainly interested in using MIG instances in Openstack cloud and I could not really find a lot of information by googling. If you could share your experiences, that would be great. Cheers. Regards Mahendra ________________________________ De : Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch> Envoyé : lundi 16 janvier 2023 11:38:08 À : openstack-discuss@lists.openstack.org Objet : Re: 答复: Experience with VGPUs Hi, all, just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens). Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this. Kind regards, Ulrich On 13.01.23 21:06, Dmitriy Rabotyagov wrote: To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh. пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com<mailto:yipikai7@gmail.com>>: Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but: - respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors. Cheers On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com<mailto:noonedeadpunk@gmail.com>> wrote:
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either).
It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users.
[1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
Le mar. 20 juin 2023 à 15:12, PAIPURI Mahendra <mahendra.paipuri@cnrs.fr> a écrit :
Hello Ulrich,
I am relaunching this discussion as I noticed that you gave a talk about this topic at OpenInfra Summit in Vancouver. Is it possible to share the presentation here? I hope the talks will be uploaded soon in YouTube.
We are mainly interested in using MIG instances in Openstack cloud and I could not really find a lot of information by googling. If you could share your experiences, that would be great.
Due to scheduling conflicts, I wasn't able to attend Ulrich's session but his feedback will be greatly listened to by me. FWIW, there was also a short session about how to enable MIG and play with Nova at the OpenInfra stage (and that one I was able to attend it), and it was quite seamless. What exact information are you looking for ? The idea with MIG is that you need to create SRIOV VFs above the MIG instances using sriov-manage script provided by nvidia so that the mediated devices will use those VFs as the base PCI devices to be used for Nova. Cheers.
Regards
Mahendra ------------------------------ *De :* Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch> *Envoyé :* lundi 16 janvier 2023 11:38:08 *À :* openstack-discuss@lists.openstack.org *Objet :* Re: 答复: Experience with VGPUs
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
Kind regards,
Ulrich On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov < noonedeadpunk@gmail.com> wrote:
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either).
It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users.
[1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card
supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
Thanks Sylvain for the pointers. One of the questions we have is: can we create MIG profiles on the host and then attach each one or more profile(s) to VMs? This bug [1] reports that once we attach one profile to a VM, rest of MIG profiles become unavailable. From what you have said about using SR-IOV and VFs, I guess this should be possible. I think you are talking about "vGPUs with OpenStack Nova" talk on OpenInfra stage. I will look into it once the videos will be online. [1] https://bugs.launchpad.net/nova/+bug/2008883 Thanks Regards Mahendra On 20/06/2023 15:47, Sylvain Bauza wrote:
Le mar. 20 juin 2023 à 15:12, PAIPURI Mahendra <mahendra.paipuri@cnrs.fr> a écrit :
Hello Ulrich,
I am relaunching this discussion as I noticed that you gave a talk about this topic at OpenInfra Summit in Vancouver. Is it possible to share the presentation here? I hope the talks will be uploaded soon in YouTube.
We are mainly interested in using MIG instances in Openstack cloud and I could not really find a lot of information by googling. If you could share your experiences, that would be great.
Due to scheduling conflicts, I wasn't able to attend Ulrich's session but his feedback will be greatly listened to by me.
FWIW, there was also a short session about how to enable MIG and play with Nova at the OpenInfra stage (and that one I was able to attend it), and it was quite seamless. What exact information are you looking for ? The idea with MIG is that you need to create SRIOV VFs above the MIG instances using sriov-manage script provided by nvidia so that the mediated devices will use those VFs as the base PCI devices to be used for Nova.
Cheers.
Regards
Mahendra
------------------------------------------------------------------------ *De :* Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch> *Envoyé :* lundi 16 janvier 2023 11:38:08 *À :* openstack-discuss@lists.openstack.org *Objet :* Re: 答复: Experience with VGPUs
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
Kind regards,
Ulrich
On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote: > > You are saying that, like Nvidia GRID drivers are open-sourced while > in fact they're super far from being that. In order to download > drivers not only for hypervisors, but also for guest VMs you need to > have an account in their Enterprise Portal. It took me roughly 6 weeks > of discussions with hardware vendors and Nvidia support to get a > proper account there. And that happened only after applying for their > Partner Network (NPN). > That still doesn't solve the issue of how to provide drivers to > guests, except pre-build a series of images with these drivers > pre-installed (we ended up with making a DIB element for that [1]). > Not saying about the need to distribute license tokens for guests and > the whole mess with compatibility between hypervisor and guest drivers > (as guest driver can't be newer then host one, and HVs can't be too > new either). > > It's not that I'm protecting AMD, but just saying that Nvidia is not > that straightforward either, and at least on paper AMD vGPUs look > easier both for operators and end-users. > > [1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid > > > > > As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them. > > >
Le mar. 20 juin 2023 à 16:31, Mahendra Paipuri <mahendra.paipuri@cnrs.fr> a écrit :
Thanks Sylvain for the pointers.
One of the questions we have is: can we create MIG profiles on the host and then attach each one or more profile(s) to VMs? This bug [1] reports that once we attach one profile to a VM, rest of MIG profiles become unavailable. From what you have said about using SR-IOV and VFs, I guess this should be possible.
Correct, what you need is to create first the VFs using sriov-manage and then you can create the MIG instances. Once you create the MIG instances using the profiles you want, you will see that the related available_instances for the nvidia mdev type (by looking at sysfs) will say that you can have a single vGPU for this profile. Then, you can use that mdev type with Nova using nova.conf. That being said, while this above is simple, the below talk was saying more about how to correctly use the GPU by the host so please wait :-)
I think you are talking about "vGPUs with OpenStack Nova" talk on OpenInfra stage. I will look into it once the videos will be online.
Indeed. -S
[1] https://bugs.launchpad.net/nova/+bug/2008883
Thanks
Regards
Mahendra On 20/06/2023 15:47, Sylvain Bauza wrote:
Le mar. 20 juin 2023 à 15:12, PAIPURI Mahendra <mahendra.paipuri@cnrs.fr> a écrit :
Hello Ulrich,
I am relaunching this discussion as I noticed that you gave a talk about this topic at OpenInfra Summit in Vancouver. Is it possible to share the presentation here? I hope the talks will be uploaded soon in YouTube.
We are mainly interested in using MIG instances in Openstack cloud and I could not really find a lot of information by googling. If you could share your experiences, that would be great.
Due to scheduling conflicts, I wasn't able to attend Ulrich's session but his feedback will be greatly listened to by me.
FWIW, there was also a short session about how to enable MIG and play with Nova at the OpenInfra stage (and that one I was able to attend it), and it was quite seamless. What exact information are you looking for ? The idea with MIG is that you need to create SRIOV VFs above the MIG instances using sriov-manage script provided by nvidia so that the mediated devices will use those VFs as the base PCI devices to be used for Nova.
Cheers.
Regards
Mahendra ------------------------------ *De :* Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch> *Envoyé :* lundi 16 janvier 2023 11:38:08 *À :* openstack-discuss@lists.openstack.org *Objet :* Re: 答复: Experience with VGPUs
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
Kind regards,
Ulrich On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov < noonedeadpunk@gmail.com> wrote:
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either).
It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users.
[1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card
supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
Hi, all, Sylvain explained quite well how to do it technically. We have a PoC running, however, still have some stability issues, as mentioned on the summit. We're running the NVIDIA virtualisation drivers on the hypervisors and the guests, which requires a license from NVIDIA. In our configuration we are still quite limited in the sense that we have to configure all cards in the same hypervisor in the same way, that is the same MIG partitioning. Also, it is not possible to attach more than one device to a single VM. As mentioned in the presentation we are a bit behind with Nova, and in the process of fixing this as we speak. Because of that we had to do a couple of back ports in Nova to make it work, which we hope to be able to get rid of by the ongoing upgrades. Let me see if I can make the slides available here. Cheers, Ulrich On 20/06/2023 19:07, Oliver Weinmann wrote:
Hi everyone,
Jumping into this topic again. Unfortunately I haven’t had time yet to test Nvidia VGPU in OpenStack but in VMware Vsphere. What our users complain most about is the inflexibility since you have to use the same profile on all vms that use the gpu. One user mentioned to try SLURM. I know there is no official OpenStack project for SLURM but I wonder if anyone else tried this approach? If I understood correctly this would also not require any Nvidia subscription since you passthrough the GPU to a single instance and you don’t use VGPU nor MIG.
Cheers, Oliver
Von meinem iPhone gesendet
Am 20.06.2023 um 17:34 schrieb Sylvain Bauza <sbauza@redhat.com>:
Le mar. 20 juin 2023 à 16:31, Mahendra Paipuri <mahendra.paipuri@cnrs.fr> a écrit :
Thanks Sylvain for the pointers.
One of the questions we have is: can we create MIG profiles on the host and then attach each one or more profile(s) to VMs? This bug [1] reports that once we attach one profile to a VM, rest of MIG profiles become unavailable. From what you have said about using SR-IOV and VFs, I guess this should be possible.
Correct, what you need is to create first the VFs using sriov-manage and then you can create the MIG instances. Once you create the MIG instances using the profiles you want, you will see that the related available_instances for the nvidia mdev type (by looking at sysfs) will say that you can have a single vGPU for this profile. Then, you can use that mdev type with Nova using nova.conf.
That being said, while this above is simple, the below talk was saying more about how to correctly use the GPU by the host so please wait :-)
I think you are talking about "vGPUs with OpenStack Nova" talk on OpenInfra stage. I will look into it once the videos will be online.
Indeed. -S
[1] https://bugs.launchpad.net/nova/+bug/2008883
Thanks
Regards
Mahendra
On 20/06/2023 15:47, Sylvain Bauza wrote:
Le mar. 20 juin 2023 à 15:12, PAIPURI Mahendra <mahendra.paipuri@cnrs.fr> a écrit :
Hello Ulrich,
I am relaunching this discussion as I noticed that you gave a talk about this topic at OpenInfra Summit in Vancouver. Is it possible to share the presentation here? I hope the talks will be uploaded soon in YouTube.
We are mainly interested in using MIG instances in Openstack cloud and I could not really find a lot of information by googling. If you could share your experiences, that would be great.
Due to scheduling conflicts, I wasn't able to attend Ulrich's session but his feedback will be greatly listened to by me.
FWIW, there was also a short session about how to enable MIG and play with Nova at the OpenInfra stage (and that one I was able to attend it), and it was quite seamless. What exact information are you looking for ? The idea with MIG is that you need to create SRIOV VFs above the MIG instances using sriov-manage script provided by nvidia so that the mediated devices will use those VFs as the base PCI devices to be used for Nova.
Cheers.
Regards
Mahendra
------------------------------------------------------------------------ *De :* Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch> *Envoyé :* lundi 16 janvier 2023 11:38:08 *À :* openstack-discuss@lists.openstack.org *Objet :* Re: 答复: Experience with VGPUs
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
Kind regards,
Ulrich
On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote: > > You are saying that, like Nvidia GRID drivers are open-sourced while > in fact they're super far from being that. In order to download > drivers not only for hypervisors, but also for guest VMs you need to > have an account in their Enterprise Portal. It took me roughly 6 weeks > of discussions with hardware vendors and Nvidia support to get a > proper account there. And that happened only after applying for their > Partner Network (NPN). > That still doesn't solve the issue of how to provide drivers to > guests, except pre-build a series of images with these drivers > pre-installed (we ended up with making a DIB element for that [1]). > Not saying about the need to distribute license tokens for guests and > the whole mess with compatibility between hypervisor and guest drivers > (as guest driver can't be newer then host one, and HVs can't be too > new either). > > It's not that I'm protecting AMD, but just saying that Nvidia is not > that straightforward either, and at least on paper AMD vGPUs look > easier both for operators and end-users. > > [1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid > > > > > As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them. > > >
Hi, again, here's a link to my slides: https://cernbox.cern.ch/s/v3YCyJjrZZv55H2 Let me know if it works. Cheers, Ulrich On 21/06/2023 16:10, Ulrich Schwickerath wrote:
Hi, all,
Sylvain explained quite well how to do it technically. We have a PoC running, however, still have some stability issues, as mentioned on the summit. We're running the NVIDIA virtualisation drivers on the hypervisors and the guests, which requires a license from NVIDIA. In our configuration we are still quite limited in the sense that we have to configure all cards in the same hypervisor in the same way, that is the same MIG partitioning. Also, it is not possible to attach more than one device to a single VM.
As mentioned in the presentation we are a bit behind with Nova, and in the process of fixing this as we speak. Because of that we had to do a couple of back ports in Nova to make it work, which we hope to be able to get rid of by the ongoing upgrades.
Let me see if I can make the slides available here.
Cheers, Ulrich
On 20/06/2023 19:07, Oliver Weinmann wrote:
Hi everyone,
Jumping into this topic again. Unfortunately I haven’t had time yet to test Nvidia VGPU in OpenStack but in VMware Vsphere. What our users complain most about is the inflexibility since you have to use the same profile on all vms that use the gpu. One user mentioned to try SLURM. I know there is no official OpenStack project for SLURM but I wonder if anyone else tried this approach? If I understood correctly this would also not require any Nvidia subscription since you passthrough the GPU to a single instance and you don’t use VGPU nor MIG.
Cheers, Oliver
Von meinem iPhone gesendet
Am 20.06.2023 um 17:34 schrieb Sylvain Bauza <sbauza@redhat.com>:
Le mar. 20 juin 2023 à 16:31, Mahendra Paipuri <mahendra.paipuri@cnrs.fr> a écrit :
Thanks Sylvain for the pointers.
One of the questions we have is: can we create MIG profiles on the host and then attach each one or more profile(s) to VMs? This bug [1] reports that once we attach one profile to a VM, rest of MIG profiles become unavailable. From what you have said about using SR-IOV and VFs, I guess this should be possible.
Correct, what you need is to create first the VFs using sriov-manage and then you can create the MIG instances. Once you create the MIG instances using the profiles you want, you will see that the related available_instances for the nvidia mdev type (by looking at sysfs) will say that you can have a single vGPU for this profile. Then, you can use that mdev type with Nova using nova.conf.
That being said, while this above is simple, the below talk was saying more about how to correctly use the GPU by the host so please wait :-)
I think you are talking about "vGPUs with OpenStack Nova" talk on OpenInfra stage. I will look into it once the videos will be online.
Indeed. -S
[1] https://bugs.launchpad.net/nova/+bug/2008883
Thanks
Regards
Mahendra
On 20/06/2023 15:47, Sylvain Bauza wrote:
Le mar. 20 juin 2023 à 15:12, PAIPURI Mahendra <mahendra.paipuri@cnrs.fr> a écrit :
Hello Ulrich,
I am relaunching this discussion as I noticed that you gave a talk about this topic at OpenInfra Summit in Vancouver. Is it possible to share the presentation here? I hope the talks will be uploaded soon in YouTube.
We are mainly interested in using MIG instances in Openstack cloud and I could not really find a lot of information by googling. If you could share your experiences, that would be great.
Due to scheduling conflicts, I wasn't able to attend Ulrich's session but his feedback will be greatly listened to by me.
FWIW, there was also a short session about how to enable MIG and play with Nova at the OpenInfra stage (and that one I was able to attend it), and it was quite seamless. What exact information are you looking for ? The idea with MIG is that you need to create SRIOV VFs above the MIG instances using sriov-manage script provided by nvidia so that the mediated devices will use those VFs as the base PCI devices to be used for Nova.
Cheers.
Regards
Mahendra
------------------------------------------------------------------------ *De :* Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch> *Envoyé :* lundi 16 janvier 2023 11:38:08 *À :* openstack-discuss@lists.openstack.org *Objet :* Re: 答复: Experience with VGPUs
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
Kind regards,
Ulrich
On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote: > > You are saying that, like Nvidia GRID drivers are open-sourced while > in fact they're super far from being that. In order to download > drivers not only for hypervisors, but also for guest VMs you need to > have an account in their Enterprise Portal. It took me roughly 6 weeks > of discussions with hardware vendors and Nvidia support to get a > proper account there. And that happened only after applying for their > Partner Network (NPN). > That still doesn't solve the issue of how to provide drivers to > guests, except pre-build a series of images with these drivers > pre-installed (we ended up with making a DIB element for that [1]). > Not saying about the need to distribute license tokens for guests and > the whole mess with compatibility between hypervisor and guest drivers > (as guest driver can't be newer then host one, and HVs can't be too > new either). > > It's not that I'm protecting AMD, but just saying that Nvidia is not > that straightforward either, and at least on paper AMD vGPUs look > easier both for operators and end-users. > > [1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid > > > > > As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them. > > >
I can recall in quite recent release notes in Nvidia drivers, that now they do allow attaching multiple vGPUs to a single VM, but I can recall Sylvain said that is not exactly as it sounds like and there're severe limitations to this advertised feature. Also I think in MIG mode it's possible to split GPU in a subset of supported (but different) flavors, though I have close to no idea how scheduling would be done in this case. On Wed, Jun 21, 2023, 17:36 Ulrich Schwickerath <ulrich.schwickerath@cern.ch> wrote:
Hi, again,
here's a link to my slides:
https://cernbox.cern.ch/s/v3YCyJjrZZv55H2
Let me know if it works.
Cheers, Ulrich
On 21/06/2023 16:10, Ulrich Schwickerath wrote:
Hi, all,
Sylvain explained quite well how to do it technically. We have a PoC running, however, still have some stability issues, as mentioned on the summit. We're running the NVIDIA virtualisation drivers on the hypervisors and the guests, which requires a license from NVIDIA. In our configuration we are still quite limited in the sense that we have to configure all cards in the same hypervisor in the same way, that is the same MIG partitioning. Also, it is not possible to attach more than one device to a single VM.
As mentioned in the presentation we are a bit behind with Nova, and in the process of fixing this as we speak. Because of that we had to do a couple of back ports in Nova to make it work, which we hope to be able to get rid of by the ongoing upgrades.
Let me see if I can make the slides available here.
Cheers, Ulrich On 20/06/2023 19:07, Oliver Weinmann wrote:
Hi everyone,
Jumping into this topic again. Unfortunately I haven’t had time yet to test Nvidia VGPU in OpenStack but in VMware Vsphere. What our users complain most about is the inflexibility since you have to use the same profile on all vms that use the gpu. One user mentioned to try SLURM. I know there is no official OpenStack project for SLURM but I wonder if anyone else tried this approach? If I understood correctly this would also not require any Nvidia subscription since you passthrough the GPU to a single instance and you don’t use VGPU nor MIG.
Cheers, Oliver
Von meinem iPhone gesendet
Am 20.06.2023 um 17:34 schrieb Sylvain Bauza <sbauza@redhat.com> <sbauza@redhat.com>:
Le mar. 20 juin 2023 à 16:31, Mahendra Paipuri <mahendra.paipuri@cnrs.fr> a écrit :
Thanks Sylvain for the pointers.
One of the questions we have is: can we create MIG profiles on the host and then attach each one or more profile(s) to VMs? This bug [1] reports that once we attach one profile to a VM, rest of MIG profiles become unavailable. From what you have said about using SR-IOV and VFs, I guess this should be possible.
Correct, what you need is to create first the VFs using sriov-manage and then you can create the MIG instances. Once you create the MIG instances using the profiles you want, you will see that the related available_instances for the nvidia mdev type (by looking at sysfs) will say that you can have a single vGPU for this profile. Then, you can use that mdev type with Nova using nova.conf.
That being said, while this above is simple, the below talk was saying more about how to correctly use the GPU by the host so please wait :-)
I think you are talking about "vGPUs with OpenStack Nova" talk on OpenInfra stage. I will look into it once the videos will be online.
Indeed. -S
[1] https://bugs.launchpad.net/nova/+bug/2008883
Thanks
Regards
Mahendra On 20/06/2023 15:47, Sylvain Bauza wrote:
Le mar. 20 juin 2023 à 15:12, PAIPURI Mahendra <mahendra.paipuri@cnrs.fr> a écrit :
Hello Ulrich,
I am relaunching this discussion as I noticed that you gave a talk about this topic at OpenInfra Summit in Vancouver. Is it possible to share the presentation here? I hope the talks will be uploaded soon in YouTube.
We are mainly interested in using MIG instances in Openstack cloud and I could not really find a lot of information by googling. If you could share your experiences, that would be great.
Due to scheduling conflicts, I wasn't able to attend Ulrich's session but his feedback will be greatly listened to by me.
FWIW, there was also a short session about how to enable MIG and play with Nova at the OpenInfra stage (and that one I was able to attend it), and it was quite seamless. What exact information are you looking for ? The idea with MIG is that you need to create SRIOV VFs above the MIG instances using sriov-manage script provided by nvidia so that the mediated devices will use those VFs as the base PCI devices to be used for Nova.
Cheers.
Regards
Mahendra ------------------------------ *De :* Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch> *Envoyé :* lundi 16 janvier 2023 11:38:08 *À :* openstack-discuss@lists.openstack.org *Objet :* Re: 答复: Experience with VGPUs
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
Kind regards,
Ulrich On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov < noonedeadpunk@gmail.com> wrote:
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either).
It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users.
[1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card
supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
Le mer. 21 juin 2023 à 18:23, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> a écrit :
I can recall in quite recent release notes in Nvidia drivers, that now they do allow attaching multiple vGPUs to a single VM, but I can recall Sylvain said that is not exactly as it sounds like and there're severe limitations to this advertised feature.
That's the problem with this feature enablement in Nova : we mostly depend on a very specific external Linux driver. So, tbc, if you want to use vGPU, please rather look at the Nvidia documentation *before* :) About multiple vGPUs, Nvidia says it depends on the GPU architecture (and that was changing since the last years) : (quoting Nvidia here) *The supported vGPUs depend on the architecture of the GPU on which the vGPUs reside: * - *For GPUs based on the NVIDIA Volta architecture and later GPU architectures, all Q-series and C-series vGPUs are supported. On GPUs that support the Multi-Instance GPU (MIG) feature, both time-sliced and MIG-backed vGPUs are supported. * - *For GPUs based on the NVIDIA Pascal™ architecture, only Q-series and C-series vGPUs that are allocated all of the physical GPU's frame buffer are supported. * - *For GPUs based on the NVIDIA NVIDIA Maxwell™ graphic architecture, only Q-series vGPUs that are allocated all of the physical GPU's frame buffer are supported. * *You can assign multiple vGPUs with differing amounts of frame buffer to a single VM, provided the board type and the series of all the vGPUs is the same. For example, you can assign an A40-48C vGPU and an A40-16C vGPU to the same VM. However, you cannot assign an A30-8C vGPU and an A16-8C vGPU to the same VM. * https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-red-hat-el-kvm/i... As a reminder, you can find the vGPU types here https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#virtual-... Basically, what changed is that with the latest Volta and Ampere architecture, Nvidia was able to provide different vGPUs with sliced frame buffer recently, while previously Nvidia was only able to pin a vGPU taking the whole pGPU frame buffer to a single VM, which was actually limiting de facto the instance to only have one single vGPU attached (or having a second vGPU attached from another pGPU, which is non trivial to schedule) For that reason, we initially limited the VGPU allocation requests to a maximum of 1 in Nova since it was horribly depending on hardware, but I eventually tried to propose to remove that limitation with https://review.opendev.org/c/openstack/nova/+/845757 which would need some further work and testing (which is nearly impossible with upstream CI since the nvidia drivers are proprietary and licensed). Some operator wanting to lift that current limitation would get all my attention if he/she would volunteer for *testing* such patch. Ping me on IRC #openstack-nova (bauzas) and we could proceed quickly.
Also I think in MIG mode it's possible to split GPU in a subset of supported (but different) flavors, though I have close to no idea how scheduling would be done in this case.
This is quite simple : you need to create different MIG instances using different heterogenous profiles and you'll see then that *some* mdev types will accordingly have an inventory of 1. You could then use some new feature we introduced in Xena, which allows the nova libvirt driver to create different custom resource classes : https://specs.openstack.org/openstack/nova-specs/specs/xena/implemented/gene... Again, testing this on real production is the crux of the problem. We provided as many functional tests as we were able in order to verify such things, but getting a real MIG-backed GPU and setting the confs appropriately is something we are missing and which would be useful for tracking bugs. Last point, I'm more than open to collaborating with CERN or any other operator wanting to stabilize the vGPU feature enablement in Nova. I know that the existing feature presents a quite long list of bug reports and has some severe limitations, but I'd be more happy with having some guidance from the operators on how and what to stabilize. -Sylvain On Wed, Jun 21, 2023, 17:36 Ulrich Schwickerath <ulrich.schwickerath@cern.ch>
wrote:
Hi, again,
here's a link to my slides:
https://cernbox.cern.ch/s/v3YCyJjrZZv55H2
Let me know if it works.
Cheers, Ulrich
Hello all, Thanks @Ulrich for sharing the presentation. Very informative!! One question : if I understood correctly, *time-sliced *vGPUs *absolutely need* GRID drivers and licensed clients for the vGPUs to work in the guests. For the MIG partitioning, there is *no need* to install GRID drivers in the guest and also *no need* to have licensed clients. Could you confirm if this is the actual case? Cheers. Regards Mahendra On 21/06/2023 16:10, Ulrich Schwickerath wrote:
Hi, all,
Sylvain explained quite well how to do it technically. We have a PoC running, however, still have some stability issues, as mentioned on the summit. We're running the NVIDIA virtualisation drivers on the hypervisors and the guests, which requires a license from NVIDIA. In our configuration we are still quite limited in the sense that we have to configure all cards in the same hypervisor in the same way, that is the same MIG partitioning. Also, it is not possible to attach more than one device to a single VM.
As mentioned in the presentation we are a bit behind with Nova, and in the process of fixing this as we speak. Because of that we had to do a couple of back ports in Nova to make it work, which we hope to be able to get rid of by the ongoing upgrades.
Let me see if I can make the slides available here.
Cheers, Ulrich
On 20/06/2023 19:07, Oliver Weinmann wrote:
Hi everyone,
Jumping into this topic again. Unfortunately I haven’t had time yet to test Nvidia VGPU in OpenStack but in VMware Vsphere. What our users complain most about is the inflexibility since you have to use the same profile on all vms that use the gpu. One user mentioned to try SLURM. I know there is no official OpenStack project for SLURM but I wonder if anyone else tried this approach? If I understood correctly this would also not require any Nvidia subscription since you passthrough the GPU to a single instance and you don’t use VGPU nor MIG.
Cheers, Oliver
Von meinem iPhone gesendet
Am 20.06.2023 um 17:34 schrieb Sylvain Bauza <sbauza@redhat.com>:
Le mar. 20 juin 2023 à 16:31, Mahendra Paipuri <mahendra.paipuri@cnrs.fr> a écrit :
Thanks Sylvain for the pointers.
One of the questions we have is: can we create MIG profiles on the host and then attach each one or more profile(s) to VMs? This bug [1] reports that once we attach one profile to a VM, rest of MIG profiles become unavailable. From what you have said about using SR-IOV and VFs, I guess this should be possible.
Correct, what you need is to create first the VFs using sriov-manage and then you can create the MIG instances. Once you create the MIG instances using the profiles you want, you will see that the related available_instances for the nvidia mdev type (by looking at sysfs) will say that you can have a single vGPU for this profile. Then, you can use that mdev type with Nova using nova.conf.
That being said, while this above is simple, the below talk was saying more about how to correctly use the GPU by the host so please wait :-)
I think you are talking about "vGPUs with OpenStack Nova" talk on OpenInfra stage. I will look into it once the videos will be online.
Indeed. -S
[1] https://bugs.launchpad.net/nova/+bug/2008883
Thanks
Regards
Mahendra
On 20/06/2023 15:47, Sylvain Bauza wrote:
Le mar. 20 juin 2023 à 15:12, PAIPURI Mahendra <mahendra.paipuri@cnrs.fr> a écrit :
Hello Ulrich,
I am relaunching this discussion as I noticed that you gave a talk about this topic at OpenInfra Summit in Vancouver. Is it possible to share the presentation here? I hope the talks will be uploaded soon in YouTube.
We are mainly interested in using MIG instances in Openstack cloud and I could not really find a lot of information by googling. If you could share your experiences, that would be great.
Due to scheduling conflicts, I wasn't able to attend Ulrich's session but his feedback will be greatly listened to by me.
FWIW, there was also a short session about how to enable MIG and play with Nova at the OpenInfra stage (and that one I was able to attend it), and it was quite seamless. What exact information are you looking for ? The idea with MIG is that you need to create SRIOV VFs above the MIG instances using sriov-manage script provided by nvidia so that the mediated devices will use those VFs as the base PCI devices to be used for Nova.
Cheers.
Regards
Mahendra
------------------------------------------------------------------------ *De :* Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch> *Envoyé :* lundi 16 janvier 2023 11:38:08 *À :* openstack-discuss@lists.openstack.org *Objet :* Re: 答复: Experience with VGPUs
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
Kind regards,
Ulrich
On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote: > > You are saying that, like Nvidia GRID drivers are open-sourced while > in fact they're super far from being that. In order to download > drivers not only for hypervisors, but also for guest VMs you need to > have an account in their Enterprise Portal. It took me roughly 6 weeks > of discussions with hardware vendors and Nvidia support to get a > proper account there. And that happened only after applying for their > Partner Network (NPN). > That still doesn't solve the issue of how to provide drivers to > guests, except pre-build a series of images with these drivers > pre-installed (we ended up with making a DIB element for that [1]). > Not saying about the need to distribute license tokens for guests and > the whole mess with compatibility between hypervisor and guest drivers > (as guest driver can't be newer then host one, and HVs can't be too > new either). > > It's not that I'm protecting AMD, but just saying that Nvidia is not > that straightforward either, and at least on paper AMD vGPUs look > easier both for operators and end-users. > > [1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid > > > > > As for AMD cards, AMD stated that some of their MI series card supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them. > > >
Le jeu. 22 juin 2023 à 10:43, Mahendra Paipuri <mahendra.paipuri@cnrs.fr> a écrit :
Hello all,
Thanks @Ulrich for sharing the presentation. Very informative!!
One question : if I understood correctly, *time-sliced *vGPUs *absolutely need* GRID drivers and licensed clients for the vGPUs to work in the guests. For the MIG partitioning, there is *no need* to install GRID drivers in the guest and also *no need* to have licensed clients. Could you confirm if this is the actual case?
Again, I'm not part of nVidia, neither I'm paid from them, but you can look at their GRID licensing here : https://docs.nvidia.com/grid/latest/grid-licensing-user-guide/index.html If you also look at the nvidia docs for RHEL support, you need a vCS (virtualComputeServer) licence for Ampere MIG profiles like C-series : https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-red-hat-el-kvm/i... Cheers.
Regards
Mahendra On 21/06/2023 16:10, Ulrich Schwickerath wrote:
Hi, all,
Sylvain explained quite well how to do it technically. We have a PoC running, however, still have some stability issues, as mentioned on the summit. We're running the NVIDIA virtualisation drivers on the hypervisors and the guests, which requires a license from NVIDIA. In our configuration we are still quite limited in the sense that we have to configure all cards in the same hypervisor in the same way, that is the same MIG partitioning. Also, it is not possible to attach more than one device to a single VM.
As mentioned in the presentation we are a bit behind with Nova, and in the process of fixing this as we speak. Because of that we had to do a couple of back ports in Nova to make it work, which we hope to be able to get rid of by the ongoing upgrades.
Let me see if I can make the slides available here.
Cheers, Ulrich On 20/06/2023 19:07, Oliver Weinmann wrote:
Hi everyone,
Jumping into this topic again. Unfortunately I haven’t had time yet to test Nvidia VGPU in OpenStack but in VMware Vsphere. What our users complain most about is the inflexibility since you have to use the same profile on all vms that use the gpu. One user mentioned to try SLURM. I know there is no official OpenStack project for SLURM but I wonder if anyone else tried this approach? If I understood correctly this would also not require any Nvidia subscription since you passthrough the GPU to a single instance and you don’t use VGPU nor MIG.
Cheers, Oliver
Von meinem iPhone gesendet
Am 20.06.2023 um 17:34 schrieb Sylvain Bauza <sbauza@redhat.com> <sbauza@redhat.com>:
Le mar. 20 juin 2023 à 16:31, Mahendra Paipuri <mahendra.paipuri@cnrs.fr> a écrit :
Thanks Sylvain for the pointers.
One of the questions we have is: can we create MIG profiles on the host and then attach each one or more profile(s) to VMs? This bug [1] reports that once we attach one profile to a VM, rest of MIG profiles become unavailable. From what you have said about using SR-IOV and VFs, I guess this should be possible.
Correct, what you need is to create first the VFs using sriov-manage and then you can create the MIG instances. Once you create the MIG instances using the profiles you want, you will see that the related available_instances for the nvidia mdev type (by looking at sysfs) will say that you can have a single vGPU for this profile. Then, you can use that mdev type with Nova using nova.conf.
That being said, while this above is simple, the below talk was saying more about how to correctly use the GPU by the host so please wait :-)
I think you are talking about "vGPUs with OpenStack Nova" talk on OpenInfra stage. I will look into it once the videos will be online.
Indeed. -S
[1] https://bugs.launchpad.net/nova/+bug/2008883
Thanks
Regards
Mahendra On 20/06/2023 15:47, Sylvain Bauza wrote:
Le mar. 20 juin 2023 à 15:12, PAIPURI Mahendra <mahendra.paipuri@cnrs.fr> a écrit :
Hello Ulrich,
I am relaunching this discussion as I noticed that you gave a talk about this topic at OpenInfra Summit in Vancouver. Is it possible to share the presentation here? I hope the talks will be uploaded soon in YouTube.
We are mainly interested in using MIG instances in Openstack cloud and I could not really find a lot of information by googling. If you could share your experiences, that would be great.
Due to scheduling conflicts, I wasn't able to attend Ulrich's session but his feedback will be greatly listened to by me.
FWIW, there was also a short session about how to enable MIG and play with Nova at the OpenInfra stage (and that one I was able to attend it), and it was quite seamless. What exact information are you looking for ? The idea with MIG is that you need to create SRIOV VFs above the MIG instances using sriov-manage script provided by nvidia so that the mediated devices will use those VFs as the base PCI devices to be used for Nova.
Cheers.
Regards
Mahendra ------------------------------ *De :* Ulrich Schwickerath <Ulrich.Schwickerath@cern.ch> *Envoyé :* lundi 16 janvier 2023 11:38:08 *À :* openstack-discuss@lists.openstack.org *Objet :* Re: 答复: Experience with VGPUs
Hi, all,
just to add to the discussion, at CERN we have recently deployed a bunch of A100 GPUs in PCI passthrough mode, and are now looking into improving their usage by using MIG. From the NOVA point of view things seem to work OK, we can schedule VMs requesting a VGPU, the client starts up and gets a license token from our NVIDIA license server (distributing license keys is our private cloud is relatively easy in our case). It's a PoC only for the time being, and we're not ready to put that forward as we're facing issues with CUDA on the client (it fails immediately in memory operations with 'not supported', still investigating why this happens).
Once we get that working it would be nice to be able to have a more fine grained scheduling so that people can ask for MIG devices of different size. The other challenge is how to set limits on GPU resources. Once the above issues have been sorted out we may want to look into cyborg as well thus we are quite interested in first experiences with this.
Kind regards,
Ulrich On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
To have that said, deb/rpm packages they are providing doesn't help much, as: * There is no repo for them, so you need to download them manually from enterprise portal * They can't be upgraded anyway, as driver version is part of the package name. And each package conflicts with any another one. So you need to explicitly remove old package and only then install new one. And yes, you must stop all VMs before upgrading driver and no, you can't live migrate GPU mdev devices due to that now being implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7@gmail.com>:
Ended up with the very same conclusions than Dimitry regarding the use of Nvidia Vgrid for the VGPU use case with Nova, it works pretty well but:
- respecting the licensing model as operationnal constraints, note that guests need to reach a license server in order to get a token (could be via the Nvidia SaaS service or on-prem) - drivers for both guest and hypervisor are not easy to implement and maintain on large scale. A year ago, hypervisors drivers were not packaged to Debian/Ubuntu, but builded though a bash script, thus requiering additional automatisation work and careful attention regarding kernel update/reboot of Nova hypervisors.
Cheers
On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov < noonedeadpunk@gmail.com> wrote:
You are saying that, like Nvidia GRID drivers are open-sourced while in fact they're super far from being that. In order to download drivers not only for hypervisors, but also for guest VMs you need to have an account in their Enterprise Portal. It took me roughly 6 weeks of discussions with hardware vendors and Nvidia support to get a proper account there. And that happened only after applying for their Partner Network (NPN). That still doesn't solve the issue of how to provide drivers to guests, except pre-build a series of images with these drivers pre-installed (we ended up with making a DIB element for that [1]). Not saying about the need to distribute license tokens for guests and the whole mess with compatibility between hypervisor and guest drivers (as guest driver can't be newer then host one, and HVs can't be too new either).
It's not that I'm protecting AMD, but just saying that Nvidia is not that straightforward either, and at least on paper AMD vGPUs look easier both for operators and end-users.
[1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
As for AMD cards, AMD stated that some of their MI series card
supports SR-IOV for vGPUs. However, those drivers are never open source or provided closed source to public, only large cloud providers are able to get them. So I don't really recommend getting AMD cards for vGPU unless you are able to get support from them.
Hi Oliver, Nvidia's vGPU/MIG are quite popular options and usage of them don't really require cyborg - they can be utilized solely with Nova/Placement. However, there are plenty of nuances, as implementation of vGPUs also depends on the GPU architecture - Tesla's are quite different from Amperes in how they got created driver-side and got represented among placement resources. Also I'm not sure that desktop cards, like RTX 3050, does support vGPUs at all. Highly likely, that the only option for this type of cards will be PCI-passthrough, which is supported quite well and super easy to implement, as doesn't require any extra drivers. But if you want to leverage vGPUs/MIG, you will likely need cards like A10 (which doesn't have MIG support) or A30. Most of supported models along with possible slices are mentioned here: https://docs.nvidia.com/grid/15.0/grid-vgpu-user-guide/index.html#supported-... Regarding licensing - with vGPU approach you always license clients, not hypervisors. So you don't need any license to create VMs with vGPUs, just hypervisor driver that can be downloaded from Nvidia enterprise portal. And you will be able to test out if vGPU works inside VM, as absent license will apply limitations only after some time. And license type also depends on the workloads you want to run. So in case of AI training workloads you most likely need vCS license, but then vGPUs can be used only as computational ones, but not for virtual desktops. You can read more about licenses and their types here: https://docs.nvidia.com/grid/15.0/grid-licensing-user-guide/index.html To be completely frank, if our workloads won't require CUDA support, I would look closely on AMD GPUs, since there is no mess with licensing and their implementation of SR-IOV is way more starightforward and clear, at least for me. So if you're looking for GPUs for virtual desktops, that might be a good option for you. However, Nvidia is way more widespread in openstack workloads, so it's more likely to get some help/gotchas regarding Nvidia rather then any other GPU. чт, 12 янв. 2023 г., 07:58 Oliver Weinmann <oliver.weinmann@me.com>:
Dear All,
we are planning to have a POC on VGPUs in our Openstack cluster. Therefore I have a few questions and generally wanted to ask how well VGPUs are supported in Openstack. The docs, in particular:
https://docs.openstack.org/nova/zed/admin/virtual-gpu.html
explain quite well the general implementation.
But I am more interested in general experience with using VGPUs in Openstack. We currently have a small YOGA cluster, planning to upgrade to Zed soon, with a couple of compute nodes. Currently our users use consumer cards like RTX 3050/3060 on their laptops and the idea would be to provide VGPUs to these users. For this I would like to make a very small POC where we first equip one compute node with an Nvidia GPU. Gladly also a few tips on which card would be a good starting point are highly appreciated. I know this heavily depends on the server hardware but this is something I can figure out later. Also do we need additional software licenses to run this? I saw this very nice presentation from CERN on VGPUs:
https://indico.cern.ch/event/776411/contributions/3345183/attachments/185162...
In the table they are listing Quadro vDWS licenses. I assume we need these in order to use the cards? Also do we need something like Cyborg for this or is VGPU fully implemented in Nova?
Best Regards,
Oliver
Le jeu. 12 janv. 2023 à 08:02, Oliver Weinmann <oliver.weinmann@me.com> a écrit :
Dear All,
we are planning to have a POC on VGPUs in our Openstack cluster. Therefore I have a few questions and generally wanted to ask how well VGPUs are supported in Openstack. The docs, in particular:
https://docs.openstack.org/nova/zed/admin/virtual-gpu.html
explain quite well the general implementation.
Indeed, and that's why you can't find nvidia-specific documentation in there. Upstream documentation in general shouldn't be telling about specific hardware but rather the general implementation.
But I am more interested in general experience with using VGPUs in Openstack. We currently have a small YOGA cluster, planning to upgrade to Zed soon, with a couple of compute nodes. Currently our users use consumer cards like RTX 3050/3060 on their laptops and the idea would be to provide VGPUs to these users. For this I would like to make a very small POC where we first equip one compute node with an Nvidia GPU. Gladly also a few tips on which card would be a good starting point are highly appreciated. I know this heavily depends on the server hardware but this is something I can figure out later. Also do we need additional software licenses to run this? I saw this very nice presentation from CERN on VGPUs:
https://indico.cern.ch/event/776411/contributions/3345183/attachments/185162...
In the table they are listing Quadro vDWS licenses. I assume we need these in order to use the cards?
Disclaimer : I'm not a Nvidia developer and I just enable their drivers so maybe I could provide wrong answers but lemme try. First, consumer cards like RTX3xxx GPUs don't support virtual GPUs because they don't have a specific nvidia license for them. For being able to create virtual GPUs, you need to rather have professional nvidia cards like Tesla or Ampere. See this documentation, it will explain both the supported hardware and the licenses you need to use (in case you want to run it from a RHEL compute) : https://docs.nvidia.com/grid/13.0/grid-vgpu-release-notes-red-hat-el-kvm/ind... That being said, you'll quickly discover those GPUs can be expensive, so maybe it would good for you to know that nvidia T4 GPUs work correctly for what you want to test.
Also do we need something like Cyborg for this or is VGPU fully implemented in Nova?
You can do either, but yeah Virtual GPUs are fully supported within Nova as of now. HTH, -Sylvain
Best Regards,
Oliver
participants (17)
-
Alex Song (宋文平)
-
Arne Wiebalck
-
Brin Zhang(张百林)
-
Cedric
-
Danny Webb
-
Dmitriy Rabotyagov
-
Gene Kuo
-
Jonathan Rosser
-
Li Liu
-
Mahendra Paipuri
-
Oliver Weinmann
-
open infra
-
PAIPURI Mahendra
-
Sean Mooney
-
Sylvain Bauza
-
Tobias Urdin
-
Ulrich Schwickerath