[openstack-dev] vGPUs support for Nova

Sahid Orentino Ferdjaoui sferdjao at redhat.com
Tue Sep 26 12:45:37 UTC 2017


On Mon, Sep 25, 2017 at 04:59:04PM +0000, Jianghua Wang wrote:
> Sahid,
> 
> Just share some background. XenServer doesn't expose vGPUs as mdev
> or pci devices.

That does not make any sense. There is physical device (PCI) which
provides functions (vGPUs). These functions are exposed through mdev
framework. What you need is the mdev UUID related to a specific vGPU
and I'm sure that XenServer is going to expose it. Something which
XenServer may not expose is the NUMA node where the physical device is
plugged on but in such situation you could still use sysfs.

> I proposed a spec about one year ago to make fake pci devices so
> that we can use the existing PCI mechanism to cover vGPUs. But
> that's not a good design and got strongly objection. After that, we
> switched to use the resource providers by following the advice from
> the core team.
>
> Regards,
> Jianghua
> 
> -----Original Message-----
> From: Sahid Orentino Ferdjaoui [mailto:sferdjao at redhat.com] 
> Sent: Monday, September 25, 2017 11:01 PM
> To: OpenStack Development Mailing List (not for usage questions) <openstack-dev at lists.openstack.org>
> Subject: Re: [openstack-dev] vGPUs support for Nova
> 
> On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:
> > On 9/25/2017 5:40 AM, Jay Pipes wrote:
> > > On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:
> > > > There is a desire to expose the vGPUs resources on top of Resource 
> > > > Provider which is probably the path we should be going in the long 
> > > > term. I was not there for the last PTG and you probably already 
> > > > made a decision about moving in that direction anyway. My personal 
> > > > feeling is that it is premature.
> > > > 
> > > > The nested Resource Provider work is not yet feature-complete and 
> > > > requires more reviewer attention. If we continue in the direction 
> > > > of Resource Provider, it will need at least 2 more releases to 
> > > > expose the vGPUs feature and that without the support of NUMA, and 
> > > > with the feeling of pushing something which is not stable/production-ready.
> > > > 
> > > > It's seems safer to first have the Resource Provider work well 
> > > > finalized/stabilized to be production-ready. Then on top of 
> > > > something stable we could start to migrate our current virt 
> > > > specific features like NUMA, CPU Pinning, Huge Pages and finally PCI devices.
> > > > 
> > > > I'm talking about PCI devices in general because I think we should 
> > > > implement the vGPU on top of our /pci framework which is 
> > > > production ready and provides the support of NUMA.
> > > > 
> > > > The hardware vendors building their drivers using mdev and the 
> > > > /pci framework currently understand only SRIOV but on a quick 
> > > > glance it does not seem complicated to make it support mdev.
> > > > 
> > > > In the /pci framework we will have to:
> > > > 
> > > > * Update the PciDevice object fields to accept NULL value for
> > > >    'address' and add new field 'uuid'
> > > > * Update PciRequest to handle a new tag like 'vgpu_types'
> > > > * Update PciDeviceStats to also maintain pool of vGPUs
> > > > 
> > > > The operators will have to create alias(-es) and configure 
> > > > flavors. Basically most of the logic is already implemented and 
> > > > the method 'consume_request' is going to select the right vGPUs 
> > > > according the request.
> > > > 
> > > > In /virt we will have to:
> > > > 
> > > > * Update the field 'pci_passthrough_devices' to also include GPUs
> > > >    devices.
> > > > * Update attach/detach PCI device to handle vGPUs
> > > > 
> > > > We have a few people interested in working on it, so we could 
> > > > certainly make this feature available for Queen.
> > > > 
> > > > I can take the lead updating/implementing the PCI and libvirt 
> > > > driver part, I'm sure Jianghua Wang will be happy to take the lead 
> > > > for the virt XenServer part.
> > > > 
> > > > And I trust Jay, Stephen and Sylvain to follow the developments.
> > > 
> > > I understand the desire to get something in to Nova to support 
> > > vGPUs, and I understand that the existing /pci modules represent the 
> > > fastest/cheapest way to get there.
> > > 
> > > I won't block you from making any of the above changes, Sahid. I'll 
> > > even do my best to review them. However, I will be primarily 
> > > focusing this cycle on getting the nested resource providers work 
> > > feature-complete for (at least) SR-IOV PF/VF devices.
> > > 
> > > The decision of whether to allow an approach that adds more to the 
> > > existing /pci module is ultimately Matt's.
> > > 
> > > Best,
> > > -jay
> > > 
> > > ____________________________________________________________________
> > > ______ OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe: 
> > > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> > Nested resource providers is not merged or production ready because we 
> > haven't made it a priority. We've certainly talked about it and Jay 
> > has had patches proposed for several releases now though.
> > 
> > Building vGPU support into the existing framework, which only a couple 
> > of people understand - certainly not me, might be a short-term gain 
> > but is just more technical debt we have to pay off later, and delays 
> > any focus on nested resource providers for the wider team.
> > 
> > At the Queens PTG it was abundantly clear that many features are 
> > dependent on nested resource providers, including several 
> > networking-related features like bandwidth-based scheduling.
> > 
> > The priorities for placement/scheduler in Queens are:
> > 
> > 1. Dan Smith's migration allocations cleanup.
> > 2. Alternative hosts for reschedules with cells v2.
> > 3. Nested resource providers.
> > 
> > All of these are in progress and need review.
> > 
> > I personally don't think we should abandon the plan to implement vGPU 
> > support with nested resource providers without first seeing any code 
> > changes for it as a proof of concept. It also sounds like we have a 
> > pretty simple staggered plan for rolling out vGPU support so it's not 
> > very detailed to start. The virt driver reports vGPU inventory and we 
> > decorate the details later with traits (which Alex Xu is working on and needs review).
> > 
> > Sahid, you could certainly implement a separate proof of concept and 
> > make that available if the nested resource providers-based change hits 
> > major issues or goes far too long and has too much risk, then we have 
> > a contingency plan at least. But I don't expect that to get review 
> > priority and you'd have to accept that it might not get merged since 
> > we want to use nested resource providers.
> 
> That seems to be fair, I understand your desire to make the implementation on Resource Provider a priority and I'm with you. In general my preference is to do not stop progress on virt features because we have a new "product" on-going.
> 
> > Either way we are going to need solid functional testing and that 
> > functional testing should be written against the API as much as 
> > possible so that it works regardless of the backend implementation of 
> > the feature. One of the big things we failed at in Pike was not doing 
> > enough functional testing of move operations with claims in the 
> > scheduler earlier in the cycle. That all came in late and we're still fixing bugs as a result.
> 
> It's very true and most of the time we are asking our users to be beta-testers, that is one more reason why my preference is for a real deprecation phase.
> 
> > If we can get started early on the functional testing for vGPUs, then 
> > work both implementations in parallel, we should be able to retain the 
> > functional tests and determine which implementation we ultimately need 
> > to go with probably sometime in the second milestone.
> > 
> > --
> > 
> > Thanks,
> > 
> > Matt
> > 
> > ______________________________________________________________________
> > ____ OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: 
> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list