答复: 答复: Experience with VGPUs

Alex Song (宋文平) songwenping at inspur.com
Tue Jan 17 02:30:51 UTC 2023


Hi, Ulrich:

Sean is expert on VGPU management from nova side. I complete the usage steps if you are using Nova to manage MIGs for example:
1. divide the A100(80G) GPUs to 1g.10gb*1+2g.20gb*1+3g.40gb*1(one 1g.10gb, one 2g.20gb and one 3g.40gb)
2.add the device config in nova.conf:
[devices]
enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701
[mdev_nvidia-699]
device_addresses = 0000:84:00.1
[mdev_nvidia-700]
device_addresses = 0000:84:00.2
[mdev_nvidia-701]
device_addresses = 0000:84:00.3
3.config the flavor metadata with VGPU:1 and create vm use the flavor, the vm will randomly allocate one MIG from [1g.10gb,2g,20gb,3g.40gb]
On step 2, if you have 2 A100(80G) GPUs on one node to use MIG, and the other GPU divide to 1g.10gb*3+4g.40gb*1, the config maybe like this:
[devices]
enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701,nvidia-702
[mdev_nvidia-699]
device_addresses = 0000:84:00.1, 0000:3b:00.1
[mdev_nvidia-700]
device_addresses = 0000:84:00.2
[mdev_nvidia-701]
device_addresses = 0000:84:00.3, 
[mdev_nvidia-702]
device_addresses = 0000:3b:00.3

In our product, we use Cyborg to manage the MIGs, from the legacy style we also need config the mig like Nova, this is difficult to maintain, especially deploy openstack on k8s, so we remove these config and automatically discovery the MIGs and support divide MIG by cyborg api. By creating device profile with vgpu type traits(nvidia-699, nvidia-700), we can appoint MIG size to create VMs.

Kind regards

-----邮件原件-----
发件人: Sean Mooney [mailto:smooney at redhat.com] 
发送时间: 2023年1月16日 19:33
收件人: Ulrich Schwickerath <Ulrich.Schwickerath at cern.ch>; openstack-discuss at lists.openstack.org
主题: Re: 答复: Experience with VGPUs

On Mon, 2023-01-16 at 11:38 +0100, Ulrich Schwickerath wrote:
> Hi, all,
> 
> just to add to the discussion, at CERN we have recently deployed a 
> bunch of A100 GPUs in PCI passthrough mode, and are now looking into 
> improving their usage by using MIG. From the NOVA point of view things 
> seem to work OK, we can schedule VMs requesting a VGPU, the client 
> starts up and gets a license token from our NVIDIA license server 
> (distributing license keys is our private cloud is relatively easy in 
> our case). It's a PoC only for the time being, and we're not ready to 
> put that forward as we're facing issues with CUDA on the client (it 
> fails immediately in memory operations with 'not supported', still 
> investigating why this happens).
> 
> Once we get that working it would be nice to be able to have a more 
> fine grained scheduling so that people can ask for MIG devices of 
> different size. The other challenge is how to set limits on GPU 
> resources. Once the above issues have been sorted out we may want to 
> look into cyborg as well thus we are quite interested in first experiences with this.

so those two usecasue can kind of be fulfilled in yoga.

in yoga we finally merged supprot for unified limits via keystone https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/unified-limits-nova.html
this allow yout to create quotas/limits on any reslouce class. that is our intended way for you to set limits on GPU resources by leveraging the generic mdev support in xena to map differnt mdev types to differnt resouce classes.
https://specs.openstack.org/openstack/nova-specs/specs/xena/implemented/generic-mdevs.html
you can also use the provider confugration files https://specs.openstack.org/openstack/nova-specs/specs/victoria/implemented/provider-config-file.html
to simplfy adding traits to the gpu resouces in a declaritive way to enabel better schduling for example adding traits for the CUDA version supported by a given vGPU on a host.

so you coudl do something like this

assuming you have 2 gpus types Alice and Bob Alice support CUDA 3 and has a small amount of vram (i.e. you older generate of gpus) Bob is the new kid on the block with CUDA 9000 support and all the vram you could ask for ( the latest and greates GPU)

using the nova geneic mdev feature you can map the Alice GPUS to  CUSTOM_VGPU_ALICE and BOB to CUSTOM_VGPU_BOB and using unifed limits you can set a limit/quota of 10 CUSTOM_VGPU_ALICE reoscues and 1 CUSTOM_VGPU_BOB resouces on a given project using provider.yaml you can tag the Alice gpus with CUSTOM_CUDA_3 and the BOB gpus with CUSTOM_CUDA_9000 in the useing flavors you can create flavor defintion that request the diferent GPU types using resouce:CUSTOM_VGPU_ALICE=1 but if you want to prevent images that need CUDA 9000 form being schduled using the ALICE GPU simply add
traits:CUSTOM_CUDA_9000 to the image.

so if you have yoga you have all of the above features avaiabel.
xena does nto give you the quota enforcement but youc and do all the schduling bits provider.yaml is entirly optionalbut that has been aournd the longest.

some of this would also just work for cyborg since it shoudl be using custom resocue classes to model the gpus in placment already.
we started adding geneic pci devices to placemnt in zed and we are completeing it this cycle https://specs.openstack.org/openstack/nova-specs/specs/2023.1/approved/pci-device-tracking-in-placement.html
so the same unified limits appoch will work for pci passthoguh going forward too.

hopefully this helps you meet those usecasues.
we dont really have any good produciton example of peopel combining all of the above featues so if you do use them as descibed feedback is welcome.

we designed these features to all work together in this way but since they are relitivly new addtions we suspect may operators have not used them yet or know about there existance.


> 
> Kind regards,
> 
> Ulrich
> 
> On 13.01.23 21:06, Dmitriy Rabotyagov wrote:
> > To have that said, deb/rpm packages they are providing doesn't help 
> > much, as:
> > * There is no repo for them, so you need to download them manually 
> > from enterprise portal
> > * They can't be upgraded anyway, as driver version is part of the 
> > package name. And each package conflicts with any another one. So 
> > you need to explicitly remove old package and only then install new one.
> > And yes, you must stop all VMs before upgrading driver and no, you 
> > can't live migrate GPU mdev devices due to that now being 
> > implemented in qemu. So deb/rpm/generic driver doesn't matter at the end tbh.
> > 
> > 
> > пт, 13 янв. 2023 г., 20:56 Cedric <yipikai7 at gmail.com>:
> > 
> > 
> >     Ended up with the very same conclusions than Dimitry regarding the
> >     use of Nvidia Vgrid for the VGPU use case with Nova, it works
> >     pretty well but:
> > 
> >     - respecting the licensing model as operationnal constraints, note
> >     that guests need to reach a license server in order to get a token
> >     (could be via the Nvidia SaaS service or on-prem)
> >     - drivers for both guest and hypervisor are not easy to implement
> >     and maintain on large scale. A year ago, hypervisors drivers were
> >     not packaged to Debian/Ubuntu, but builded though a bash script,
> >     thus requiering additional automatisation work and careful
> >     attention regarding kernel update/reboot of Nova hypervisors.
> > 
> >     Cheers
> > 
> > 
> >     On Fri, Jan 13, 2023 at 4:21 PM Dmitriy Rabotyagov
> >     <noonedeadpunk at gmail.com> wrote:
> >     >
> >     > You are saying that, like Nvidia GRID drivers are open-sourced while
> >     > in fact they're super far from being that. In order to download
> >     > drivers not only for hypervisors, but also for guest VMs you need to
> >     > have an account in their Enterprise Portal. It took me roughly 6
> >     weeks
> >     > of discussions with hardware vendors and Nvidia support to get a
> >     > proper account there. And that happened only after applying for
> >     their
> >     > Partner Network (NPN).
> >     > That still doesn't solve the issue of how to provide drivers to
> >     > guests, except pre-build a series of images with these drivers
> >     > pre-installed (we ended up with making a DIB element for that [1]).
> >     > Not saying about the need to distribute license tokens for
> >     guests and
> >     > the whole mess with compatibility between hypervisor and guest
> >     drivers
> >     > (as guest driver can't be newer then host one, and HVs can't be too
> >     > new either).
> >     >
> >     > It's not that I'm protecting AMD, but just saying that Nvidia is not
> >     > that straightforward either, and at least on paper AMD vGPUs look
> >     > easier both for operators and end-users.
> >     >
> >     > [1] https://github.com/citynetwork/dib-elements/tree/main/nvgrid
> >     >
> >     > >
> >     > > As for AMD cards, AMD stated that some of their MI series card
> >     supports SR-IOV for vGPUs. However, those drivers are never open
> >     source or provided closed source to public, only large cloud
> >     providers are able to get them. So I don't really recommend
> >     getting AMD cards for vGPU unless you are able to get support from
> >     them.
> >     > >
> >     >


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3774 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230117/880dbedf/attachment.bin>


More information about the openstack-discuss mailing list