[openstack-dev] vGPUs support for Nova
Sahid Orentino Ferdjaoui
sferdjao at redhat.com
Mon Sep 25 15:00:50 UTC 2017
On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:
> On 9/25/2017 5:40 AM, Jay Pipes wrote:
> > On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:
> > > There is a desire to expose the vGPUs resources on top of Resource
> > > Provider which is probably the path we should be going in the long
> > > term. I was not there for the last PTG and you probably already made a
> > > decision about moving in that direction anyway. My personal feeling is
> > > that it is premature.
> > >
> > > The nested Resource Provider work is not yet feature-complete and
> > > requires more reviewer attention. If we continue in the direction of
> > > Resource Provider, it will need at least 2 more releases to expose the
> > > vGPUs feature and that without the support of NUMA, and with the
> > > feeling of pushing something which is not stable/production-ready.
> > >
> > > It's seems safer to first have the Resource Provider work well
> > > finalized/stabilized to be production-ready. Then on top of something
> > > stable we could start to migrate our current virt specific features
> > > like NUMA, CPU Pinning, Huge Pages and finally PCI devices.
> > >
> > > I'm talking about PCI devices in general because I think we should
> > > implement the vGPU on top of our /pci framework which is production
> > > ready and provides the support of NUMA.
> > >
> > > The hardware vendors building their drivers using mdev and the /pci
> > > framework currently understand only SRIOV but on a quick glance it
> > > does not seem complicated to make it support mdev.
> > >
> > > In the /pci framework we will have to:
> > >
> > > * Update the PciDevice object fields to accept NULL value for
> > > 'address' and add new field 'uuid'
> > > * Update PciRequest to handle a new tag like 'vgpu_types'
> > > * Update PciDeviceStats to also maintain pool of vGPUs
> > >
> > > The operators will have to create alias(-es) and configure
> > > flavors. Basically most of the logic is already implemented and the
> > > method 'consume_request' is going to select the right vGPUs according
> > > the request.
> > >
> > > In /virt we will have to:
> > >
> > > * Update the field 'pci_passthrough_devices' to also include GPUs
> > > devices.
> > > * Update attach/detach PCI device to handle vGPUs
> > >
> > > We have a few people interested in working on it, so we could
> > > certainly make this feature available for Queen.
> > >
> > > I can take the lead updating/implementing the PCI and libvirt driver
> > > part, I'm sure Jianghua Wang will be happy to take the lead for the
> > > virt XenServer part.
> > >
> > > And I trust Jay, Stephen and Sylvain to follow the developments.
> >
> > I understand the desire to get something in to Nova to support vGPUs,
> > and I understand that the existing /pci modules represent the
> > fastest/cheapest way to get there.
> >
> > I won't block you from making any of the above changes, Sahid. I'll even
> > do my best to review them. However, I will be primarily focusing this
> > cycle on getting the nested resource providers work feature-complete for
> > (at least) SR-IOV PF/VF devices.
> >
> > The decision of whether to allow an approach that adds more to the
> > existing /pci module is ultimately Matt's.
> >
> > Best,
> > -jay
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> Nested resource providers is not merged or production ready because we
> haven't made it a priority. We've certainly talked about it and Jay has had
> patches proposed for several releases now though.
>
> Building vGPU support into the existing framework, which only a couple of
> people understand - certainly not me, might be a short-term gain but is just
> more technical debt we have to pay off later, and delays any focus on nested
> resource providers for the wider team.
>
> At the Queens PTG it was abundantly clear that many features are dependent
> on nested resource providers, including several networking-related features
> like bandwidth-based scheduling.
>
> The priorities for placement/scheduler in Queens are:
>
> 1. Dan Smith's migration allocations cleanup.
> 2. Alternative hosts for reschedules with cells v2.
> 3. Nested resource providers.
>
> All of these are in progress and need review.
>
> I personally don't think we should abandon the plan to implement vGPU
> support with nested resource providers without first seeing any code changes
> for it as a proof of concept. It also sounds like we have a pretty simple
> staggered plan for rolling out vGPU support so it's not very detailed to
> start. The virt driver reports vGPU inventory and we decorate the details
> later with traits (which Alex Xu is working on and needs review).
>
> Sahid, you could certainly implement a separate proof of concept and make
> that available if the nested resource providers-based change hits major
> issues or goes far too long and has too much risk, then we have a
> contingency plan at least. But I don't expect that to get review priority
> and you'd have to accept that it might not get merged since we want to use
> nested resource providers.
That seems to be fair, I understand your desire to make the
implementation on Resource Provider a priority and I'm with you. In
general my preference is to do not stop progress on virt features
because we have a new "product" on-going.
> Either way we are going to need solid functional testing and that functional
> testing should be written against the API as much as possible so that it
> works regardless of the backend implementation of the feature. One of the
> big things we failed at in Pike was not doing enough functional testing of
> move operations with claims in the scheduler earlier in the cycle. That all
> came in late and we're still fixing bugs as a result.
It's very true and most of the time we are asking our users to be
beta-testers, that is one more reason why my preference is for a real
deprecation phase.
> If we can get started early on the functional testing for vGPUs, then work
> both implementations in parallel, we should be able to retain the functional
> tests and determine which implementation we ultimately need to go with
> probably sometime in the second milestone.
>
> --
>
> Thanks,
>
> Matt
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list