[openstack-dev] vGPUs support for Nova - Implementation

Sahid Orentino Ferdjaoui sferdjao at redhat.com
Fri Sep 29 08:53:44 UTC 2017


On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote:
> On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:
> > Please consider the support of MDEV for the /pci framework which
> > provides support for vGPUs [0].
> > 
> > Accordingly to the discussion [1]
> > 
> > With this first implementation which could be used as a skeleton for
> > implementing PCI Devices in Resource Tracker
> 
> I'm not entirely sure what you're referring to above as "implementing PCI
> devices in Resource Tracker". Could you elaborate? The resource tracker
> already embeds a PciManager object that manages PCI devices, as you know.
> Perhaps you meant "implement PCI devices as Resource Providers"?

A PciManager? I know that we have a field PCI_DEVICE :) - I guess a
virt driver can return inventory with total of PCI devices. Talking
about manager, not sure.

You still have to define "traits", basically for physical network
devices, the users want to select device according physical network,
to select device according the placement on host (NUMA), to select the
device according the bandwidth capability... For GPU it's same
story. *And I do not have mentioned devices which support virtual
functions.*

So that is what you plan to do for this release :) - Reasonably I
don't think we are close to have something ready for production.

Jay, I have question, Why you don't start by exposing NUMA ?

> > we provide support for
> > attaching vGPUs to guests. And also to provide affinity per NUMA
> > nodes. An other important point is that that implementation can take
> > advantage of the ongoing specs like PCI NUMA policies.
> > 
> > * The Implementation [0]
> > 
> > [PATCH 01/13] pci: update PciDevice object field 'address' to accept
> > [PATCH 02/13] pci: add for PciDevice object new field mdev
> > [PATCH 03/13] pci: generalize object unit-tests for different
> > [PATCH 04/13] pci: add support for mdev device type request
> > [PATCH 05/13] pci: generalize stats unit-tests for different
> > [PATCH 06/13] pci: add support for mdev devices type devspec
> > [PATCH 07/13] pci: add support for resource pool stats of mdev
> > [PATCH 08/13] pci: make manager to accept handling mdev devices
> > 
> > In this serie of patches we are generalizing the PCI framework to
> > handle MDEV devices. We arguing it's a lot of patches but most of them
> > are small and the logic behind is basically to make it understand two
> > new fields MDEV_PF and MDEV_VF.
> 
> That's not really "generalizing the PCI framework to handle MDEV devices" :)
> More like it's just changing the /pci module to understand a different
> device management API, but ok.

If you prefer call it like that :) - The point is the /pci manages
physical devices, It can passthrough the whole device or its virtual
functions exposed through SRIOV or MDEV.

> > [PATCH 09/13] libvirt: update PCI node device to report mdev devices
> > [PATCH 10/13] libvirt: report mdev resources
> > [PATCH 11/13] libvirt: add support to start vm with using mdev (vGPU)
> > 
> > In this serie of patches we make libvirt driver support, as usually,
> > return resources and attach devices returned by the pci manager. This
> > part can be reused for Resource Provider.
> 
> Perhaps, but the idea behind the resource providers framework is to treat
> devices as generic things. Placement doesn't need to know about the
> particular device attachment status.
> 
> > [PATCH 12/13] functional: rework fakelibvirt host pci devices
> > [PATCH 13/13] libvirt: resuse SRIOV funtional tests for MDEV devices
> > 
> > Here we reuse 100/100 of the functional tests used for SR-IOV
> > devices. Again here, this part can be reused for Resource Provider.
> 
> Probably not, but I'll take a look :)
> 
> For the record, I have zero confidence in any existing "functional" tests
> for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, due
> to the fact that these features often require hardware that either the
> upstream community CI lacks or that depends on libraries, drivers and kernel
> versions that really aren't available to non-bleeding edge users (or users
> with very deep pockets).

It's good point, if you are not confidence, don't you think it's
premature to move forward on implementing new thing without to have
well trusted functional tests?

> > * The Usage
> > 
> > There are no difference between SR-IOV and MDEV, from operators point
> > of view who knows how to expose SR-IOV devices in Nova, they already
> > know how to expose MDEV devices (vGPUs).
> > 
> > Operators will be able to expose MDEV devices in the same manner as
> > they expose SR-IOV:
> > 
> >   1/ Configure whitelist devices
> > 
> >   ['{"vendor_id":"10de"}']
> > 
> >   2/ Create aliases
> > 
> >   [{"vendor_id":"10de", "name":"vGPU"}]
> > 
> >   3/ Configure the flavor
> > 
> >   openstack flavor set --property "pci_passthrough:alias"="vGPU:1"
> > 
> > * Limitations
> > 
> > The mdev does not provide 'product_id' but 'mdev_type' which should be
> > considered to exactly identify which resource users can request e.g:
> > nvidia-10. To provide that support we have to add a new field
> > 'mdev_type' so aliases could be something like:
> > 
> >   {"vendor_id":"10de", mdev_type="nvidia-10" "name":"alias-nvidia-10"}
> >   {"vendor_id":"10de", mdev_type="nvidia-11" "name":"alias-nvidia-11"}
> > 
> > I do have plan to add but first I need to have support from upstream
> > to continue that work.
> 
> As mentioned in IRC and the previous ML discussion, my focus is on the
> nested resource providers work and reviews, along with the other two
> top-priority scheduler items (move operations and alternate hosts).
> 
> I'll do my best to look at your patch series, but please note it's lower
> priority than a number of other items.

No worries, the code is here, tested, fully functionnal and
production-ready, I made effort to make it available at the very
beginning of the release. With some good volitions we could fix any
bugs and have support for vGPUs in Queens.

> One thing that would be very useful, Sahid, if you could get with Eric Fried
> (efried) on IRC and discuss with him the "generic device management" system
> that was discussed at the PTG. It's likely that the /pci module is going to
> be overhauled in Rocky and it would be good to have the mdev device
> management API requirements included in that discussion.
> 
> Best,
> -jay
> 
> Best,
> -jay
> 
> > 
> > [0] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:pci-mdev-support
> > [1] http://lists.openstack.org/pipermail/openstack-dev/2017-September/122591.html
> > 
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list