Nova support for multiple vGPUs?
Sylvain Bauza
sbauza at redhat.com
Thu Apr 21 10:57:13 UTC 2022
Le jeu. 21 avr. 2022 à 12:26, Sean Mooney <smooney at redhat.com> a écrit :
> On Wed, 2022-04-20 at 16:42 +0000, Sigurd Kristian Brinch wrote:
> > Hi,
> > As far as I can tell, libvirt/KVM supports multiple vGPUs per VM
> > (
> https://docs.nvidia.com/grid/14.0/grid-vgpu-release-notes-generic-linux-kvm/index.html#multiple-vgpu-support
> ),
> > but in OpenStack/Nova it is limited to one vGPU per VM
> > (
> https://docs.openstack.org/nova/latest/admin/virtual-gpu.html#configure-a-flavor-controller
> )
> > Is there a reason for this limit?
> yes nvidia
> > What would be needed to enable multiple vGPUs in Nova?
> so you can technically do it today if you have 2 vGPU for seperate
> physical gpu cards
> but nvidia do not support multiple vGPUs form the same card.
>
> nova does not currently provide a way to force the gpu allocation to be
> from seperate cards.
>
>
> well thats not quite true you could
>
> you would have to use the named group syntax to request them so instaed of
> resources:vgpu=2
>
> you woudl do
>
> resources_first_gpu_group:VGPU=1
> resources_second_gpu_group:VGPU=1
> group_policy=isolate
>
> the name after resouces_ is arbitray group name provided it conforms to
> this regex '([a-zA-Z0-9_-]{1,64})?'
>
> we stongly dislike this approch.
> first of all using group_policy=isolate is a gloabl thing meaning that no
> request groups can come form the same provider
>
> that means you can not have to sriov VFs from the same physical nic as a
> result of setting it.
> if you dont set group_policy the default is none which means you no longer
> are guarenteed that they will come form different providres
>
> so what you woudl need to do is extend placment to support isolating only
> sepeicic named groups
> and then expose that in nova via flavor extra specs which is not particaly
> good ux as it rather complicated and means you need to
> understand how placement works in depth. placement shoudl really be an
> implemenation detail
> i.e.
> resources_first_gpu_group:VGPU=1
> resources_second_gpu_group:VGPU=1
> group_isolate=first_grpu_group,second_gpu_group;...
>
> that fixes the confilct with sriov and all other usages of resouce groups
> like bandwith based qos
>
> the slightly better approch wouls be to make this simplere to use by doing
> somtihng liek this
>
> resources:vgpu=2
> vgpu:gpu_selection_policy=isolate
>
> we would still need the placement feature to isolate by group
> but we can hide the detail form the end user with a pre filter in nova
>
> https://github.com/openstack/nova/blob/eedbff38599addd4574084edac8b111c4e1f244a/nova/scheduler/request_filter.py
> which will transfrom the resouce request and split it up into groups
> automatically
>
> this is a long way to say that if it was not for limiations in the iommu
> on nvidia gpus and the fact that they cannot map two vgpus
> to from on phsyical gpu to a singel vm this would already work out of hte
> box wiht just
> resources:vgpu=2. perhaps when intel lauch there discret datacenter gpus
> there vGPU implementaiotn will not have this limiation.
> we do not prevent you from requestin 2 vgpus today it will just fail when
> qemu tries to use them.
>
> we also have not put the effort into working around the limiation in
> nvidias hardware since ther drivers also used to block this
> until the ampear generation and there has nto been a large request to
> support multipel vgpus form users.
>
> ocationally some will ask about it but in general peopel either do full
> gpu passthough or use 1 vgpu instance.
>
>
Correct, that's why we have this open bug report for a while, but we don't
really want to fix for only one vendor.
> hopefully that will help.
> you can try the first approch today if you have more then one physical gpu
> per host
> e.g.
> resources_first_gpu_group:VGPU=1
> resources_second_gpu_group:VGPU=1
> group_policy=isolate
>
> just be aware of the limiation fo group_policy=isolate
>
Thanks Sean for explaining how to use a workaround.
>
> regard
> sean
>
>
> >
> > BR
> > Sigurd
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20220421/dd9d1122/attachment.htm>
More information about the openstack-discuss
mailing list