[openstack-dev] vGPUs support for Nova - Implementation

Mooney, Sean K sean.k.mooney at intel.com
Mon Oct 2 15:28:57 UTC 2017



> -----Original Message-----
> From: Dan Smith [mailto:dms at danplanet.com]
> Sent: Monday, October 2, 2017 3:53 PM
> To: OpenStack Development Mailing List (not for usage questions)
> <openstack-dev at lists.openstack.org>
> Subject: Re: [openstack-dev] vGPUs support for Nova - Implementation
> 
> >> I also think there is value in exposing vGPU in a generic way,
> irrespective of the underlying implementation (whether it is DEMU,
> mdev, SR-IOV or whatever approach Hyper-V/VMWare use).
> >
> > That is a big ask. To start with, all GPUs are not created equal, and
> > various vGPU functionality as designed by the GPU vendors is not
> > consistent, never mind the quirks added between different hypervisor
> > implementations. So I feel like trying to expose this in a generic
> > manner is, at least asking for problems, and more likely bound for
> > failure.
> 
> I feel the opposite. IMHO, Nova’s role in life is not to expose all the
> quirks of the underlying platform, but rather to provide a useful
> abstraction on top of those things. In spite of them.
[Mooney, Sean K] I have to agree with dan here.
vGPUs are a great example of where nova can add value by abstracting
the hypervisor specifics and provide a abstract api to allow requesting
vGPUS without having to encode the semantics of that api provide by the
hypervisor or hardware vendor in what we expose to the tenant.
> 
> > Nova already exposes plenty of hypervisor-specific functionality (or
> > functionality only implemented for one hypervisor), and that's fine.
> 
> And those bits of functionality are some of the most problematic we
> have. Among other reasons, they make it difficult for us to expose
> Thing 2.0, when we’ve encoded Thing 1.0 into our API so rigidly. This
> happens even within one virt driver where Thing 2.0 is significantly
> different than Thing 1.0.
> 
> The vGPU stuff seems well-suited for the generic modeling work that
> we’ve spent the last few years working on, and is a perfect example of
> an area where we can avoid piling on more debt to a not-abstract-enough
> “model” and move forward with the new one. That’s certainly my
> preference, and I think it’s actually less work than the debt-ridden
> way.
> 
> -—Dan
[Mooney, Sean K] I also agree that its likely less work to start fresh with 
the correct generic solution now, then try to adapt the pci passthough code
we have today to support vGPUs with out breaking the current
sriov and passthrough support. how vGPUs are virtualized is GPU vendor specific
so even within a single host you may neen to support multiple methods 
(sriov/mdev...) in a single virt dirver. For example a cloud/host with both amd and nvidia
Gpus which uses Libvirt would have to support generating the correct xml for both solutions.
> 
> 
> 
> _______________________________________________________________________
> ___
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


More information about the OpenStack-dev mailing list