[openstack-dev] [nova] [neutron] PCI pass-through network support

Jiang, Yunhong yunhong.jiang at intel.com
Fri Jan 10 19:17:22 UTC 2014

Brian, the issue of 'class name' is because currently the libvirt does not provide such information, otherwise we are glad to add that :(
But this is a good point and we have considered already. One solution is to retrieve it through some code like read the configuration space directly. But that's not so easy especially considering the different platform has different method to get the configuration space. A workaround (at least in first step) is to use the user defined property, so that user can define it through configuration space.

The issue to udev is, it's linux specific, and it may even various in different distribution.


From: Brian Schott [mailto:brian.schott at nimbisservices.com]
Sent: Thursday, January 09, 2014 11:19 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support


The idea of pci flavors is a great and using vendor_id and product_id make sense, but I could see a case for adding the class name such as 'VGA compatible controller'. Otherwise, slightly different generations of hardware will mean custom whitelist setups on each compute node.

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] (rev a1)

On the flip side, vendor_id and product_id might not be sufficient.  Suppose I have two identical NICs, one for nova internal use and the second for guest tenants?  So, bus numbering may be required.

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] (rev a1)

Some possible combinations:

# take 2 gpus
     { "vendor_id":"NVIDIA Corporation G71","product_id":"GeForce 7900 GTX", "name":"GPU"},

# only take the GPU on PCI 2
     { "vendor_id":"NVIDIA Corporation G71","product_id":"GeForce 7900 GTX", 'bus_id': '02:', "name":"GPU"},
     {"bus_id": "01:00.0", "name": "GPU"},
     {"bus_id": "02:00.0", "name": "GPU"},

     {"class": "VGA compatible controller", "name": "GPU"},

     { "product_id":"GeForce 7900 GTX", "name":"GPU"},

I know you guys are thinking of PCI devices, but any though of mapping to something like udev rather than pci?  Supporting udev rules might be easier and more robust rather than making something up.


Brian Schott, CTO
Nimbis Services, Inc.
brian.schott at nimbisservices.com<mailto:brian.schott at nimbisservices.com>
ph: 443-274-6064  fx: 443-274-6060

On Jan 9, 2014, at 12:47 PM, Ian Wells <ijw.ubuntu at cack.org.uk<mailto:ijw.ubuntu at cack.org.uk>> wrote:

I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is just refinement.

On 9 January 2014 17:49, Robert Li (baoli) <baoli at cisco.com<mailto:baoli at cisco.com>> wrote:

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an effort to come to consensus and move forward. Thanks John for doing that, and I appreciate everyone's effort to make it to the daily meeting. Let's reconvene on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say a few things. I think that first of all, we need to get agreement on the terminologies that we are using so far. With the current nova PCI passthrough

        PCI whitelist: defines all the available PCI passthrough devices on a compute node. pci_passthrough_whitelist=[{ "vendor_id":"xxxx","product_id":"xxxx"}]
        PCI Alias: criteria defined on the controller node with which requested PCI passthrough devices can be selected from all the PCI passthrough devices available in a cloud.
                Currently it has the following format: pci_alias={"vendor_id":"xxxx", "product_id":"xxxx", "name":"str"}

        nova flavor extra_specs: request for PCI passthrough devices can be specified with extra_specs in the format for example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the controller. The implications for it is that when matching it against the PCI devices, it has to match the vendor_id and product_id against all the available PCI devices until one is found. The name is only used for reference in the extra_specs. On the other hand, the whitelist is basically the same as the alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI flavors as Yongli puts it). Without introducing other complexities, and with a little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"xxxx","product_id":"xxxx", "name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a PCI group name. You can think of it as combining the definitions of the existing whitelist and PCI alias. And believe it or not, a PCI group is actually a PCI alias. However, with that change of thinking, a lot of benefits can be harvested:

         * the implementation is significantly simplified
         * provisioning is simplified by eliminating the PCI alias
         * a compute node only needs to report stats with something like: PCI group name:count. A compute node processes all the PCI passthrough devices against the whitelist, and assign a PCI group based on the whitelist definition.
         * on the controller, we may only need to define the PCI group names. if we use a nova api to define PCI groups (could be private or public, for example), one potential benefit, among other things (validation, etc),  they can be owned by the tenant that creates them. And thus a wholesale of PCI passthrough devices is also possible.
         * scheduler only works with PCI group names.
         * request for PCI passthrough device is based on PCI-group
         * deployers can provision the cloud based on the PCI groups
         * Particularly for SRIOV, deployers can design SRIOV PCI groups based on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be used in the extra specs, it can also be used in the -nic option and the neutron commands. This allows the most flexibilities and functionalities afforded by SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI device's class.

For vnic-type (or nic-type), we are saying that it defines the link characteristics of the nic that is attached to a VM: a nic that's connected to a virtual switch, a nic that is connected to a physical switch, or a nic that is connected to a physical switch, but has a host macvtap device in between. The actual names of the choices are not important here, and can be debated.

I'm hoping that we can go over the above on Monday. But any comments are welcome by email.


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>

OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140110/248a8d37/attachment.html>

More information about the OpenStack-dev mailing list