[openstack-dev] [nova] [neutron] PCI pass-through network support

Ian Wells ijw.ubuntu at cack.org.uk
Mon Jan 13 11:59:39 UTC 2014


Irena, have a word with Bob (rkukura on IRC, East coast), he was talking
about what would be needed already and should be able to help you.
Conveniently he's also core. ;)
-- 
Ian.


On 12 January 2014 22:12, Irena Berezovsky <irenab at mellanox.com> wrote:

> Hi John,
> Thank you for taking an initiative and summing up the work that need to be
> done to provide PCI pass-through network support.
> The only item I think is missing is the neutron support for PCI
> pass-through. Currently we have Mellanox Plugin that supports PCI
> pass-through assuming Mellanox Adapter card embedded switch technology. But
> in order to have fully integrated  PCI pass-through networking support for
> the use cases Robert listed on previous mail, the generic neutron PCI
> pass-through support is required. This can be enhanced with vendor specific
> task that may differ (Mellanox Embedded switch vs Cisco 802.1BR), but there
> is still common part of being PCI aware mechanism driver.
> I have already started with definition for this part:
>
> https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#
> I also plan to start coding soon.
>
> Depends on how it goes, I can take also nova parts that integrate with
> neutron APIs from item 3.
>
> Regards,
> Irena
>
> -----Original Message-----
> From: John Garbutt [mailto:john at johngarbutt.com]
> Sent: Friday, January 10, 2014 4:34 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
> Apologies for this top post, I just want to move this discussion towards
> action.
>
> I am traveling next week so it is unlikely that I can make the meetings.
> Sorry.
>
> Can we please agree on some concrete actions, and who will do the coding?
> This also means raising new blueprints for each item of work.
> I am happy to review and eventually approve those blueprints, if you email
> me directly.
>
> Ideas are taken from what we started to agree on, mostly written up here:
> https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions
>
>
> What doesn't need doing...
> ====================
>
> We have PCI whitelist and PCI alias at the moment, let keep those names
> the same for now.
> I personally prefer PCI-flavor, rather than PCI-alias, but lets discuss
> any rename separately.
>
> We seemed happy with the current system (roughly) around GPU passthrough:
> nova flavor-key <three_GPU_attached_30GB> set "pci_passthrough:alias"="
> large_GPU:1,small_GPU:2"
> nova boot --image some_image --flavor <three_GPU_attached_30GB> <some_name>
>
> Again, we seemed happy with the current PCI whitelist.
>
> Sure, we could optimise the scheduling, but again, please keep that a
> separate discussion.
> Something in the scheduler needs to know how many of each PCI alias are
> available on each host.
> How that information gets there can be change at a later date.
>
> PCI alias is in config, but its probably better defined using host
> aggregates, or some custom API.
> But lets leave that for now, and discuss it separately.
> If the need arrises, we can migrate away from the config.
>
>
> What does need doing...
> ==================
>
> 1) API & CLI changes for "nic-type", and associated tempest tests
>
> * Add a user visible "nic-type" so users can express on of several network
> types.
> * We need a default nic-type, for when the user doesn't specify one (might
> default to SRIOV in some cases)
> * We can easily test the case where the default is virtual and the user
> expresses a preference for virtual
> * Above is much better than not testing it at all.
>
> nova boot --flavor m1.large --image <image_id>
>   --nic net-id=<net-id-1>
>   --nic net-id=<net-id-2>,nic-type=fast
>   --nic net-id=<net-id-3>,nic-type=fast <vm-name>
>
> or
>
> neutron port-create
>   --fixed-ip subnet_id=<subnet-id>,ip_address=192.168.57.101
>   --nic-type=<slow | fast | foobar>
>   <net-id>
> nova boot --flavor m1.large --image <image_id> --nic port-id=<port-id>
>
> Where nic-type is just an extra bit metadata string that is passed to nova
> and the VIF driver.
>
>
> 2) Expand PCI alias information
>
> We need extensions to PCI alias so we can group SRIOV devices better.
>
> I still think we are yet to agree on a format, but I would suggest this as
> a starting point:
>
> {
>  "name":"GPU_fast",
>  devices:[
>   {"vendor_id":"1137","product_id":"0071", address:"*",
> "attach-type":"direct"},
>   {"vendor_id":"1137","product_id":"0072", address:"*",
> "attach-type":"direct"}  ],
>  sriov_info: {}
> }
>
> {
>  "name":"NIC_fast",
>  devices:[
>   {"vendor_id":"1137","product_id":"0071", address:"0:[1-50]:2:*",
> "attach-type":"macvtap"}
>   {"vendor_id":"1234","product_id":"0081", address:"*",
> "attach-type":"direct"}  ],
>  sriov_info: {
>   "nic_type":"fast",
>   "network_ids": ["net-id-1", "net-id-2"]  } }
>
> {
>  "name":"NIC_slower",
>  devices:[
>   {"vendor_id":"1137","product_id":"0071", address:"*",
> "attach-type":"direct"}
>   {"vendor_id":"1234","product_id":"0081", address:"*",
> "attach-type":"direct"}  ],
>  sriov_info: {
>   "nic_type":"fast",
>   "network_ids": ["*"]  # this means could attach to any network  } }
>
> The idea being the VIF driver gets passed this info, when network_info
> includes a nic that matches.
> Any other details, like VLAN id, would come from neutron, and passed to
> the VIF driver as normal.
>
>
> 3) Reading "nic_type" and doing the PCI passthrough of NIC user requests
>
> Not sure we are agreed on this, but basically:
> * network_info contains "nic-type" from neutron
> * need to select the correct VIF driver
> * need to pass matching PCI alias information to VIF driver
> * neutron passes details other details (like VLAN id) as before
> * nova gives VIF driver an API that allows it to attach PCI devices that
> are in the whitelist to the VM being configured
> * with all this, the VIF driver can do what it needs to do
> * lets keep it simple, and expand it as the need arrises
>
> 4) Make changes to VIF drivers, so the above is implemented
>
> Depends on (3)
>
>
>
> These seems like some good steps to get the basics in place for PCI
> passthrough networking.
> Once its working, we can review it and see if there are things that need
> to evolve further.
>
> Does that seem like a workable approach?
> Who is willing to implement any of (1), (2) and (3)?
>
>
> Cheers,
> John
>
>
> On 9 January 2014 17:47, Ian Wells <ijw.ubuntu at cack.org.uk> wrote:
> > I think I'm in agreement with all of this.  Nice summary, Robert.
> >
> > It may not be where the work ends, but if we could get this done the
> > rest is just refinement.
> >
> >
> > On 9 January 2014 17:49, Robert Li (baoli) <baoli at cisco.com> wrote:
> >>
> >> Hi Folks,
> >>
> >>
> >> With John joining the IRC, so far, we had a couple of productive
> >> meetings in an effort to come to consensus and move forward. Thanks
> >> John for doing that, and I appreciate everyone's effort to make it to
> the daily meeting.
> >> Let's reconvene on Monday.
> >>
> >> But before that, and based on our today's conversation on IRC, I'd
> >> like to say a few things. I think that first of all, we need to get
> >> agreement on the terminologies that we are using so far. With the
> >> current nova PCI passthrough
> >>
> >>         PCI whitelist: defines all the available PCI passthrough
> >> devices on a compute node. pci_passthrough_whitelist=[{
> >> "vendor_id":"xxxx","product_id":"xxxx"}]
> >>         PCI Alias: criteria defined on the controller node with which
> >> requested PCI passthrough devices can be selected from all the PCI
> >> passthrough devices available in a cloud.
> >>                 Currently it has the following format:
> >> pci_alias={"vendor_id":"xxxx", "product_id":"xxxx", "name":"str"}
> >>
> >>         nova flavor extra_specs: request for PCI passthrough devices
> >> can be specified with extra_specs in the format for
> >> example:"pci_passthrough:alias"="name:count"
> >>
> >> As you can see, currently a PCI alias has a name and is defined on
> >> the controller. The implications for it is that when matching it
> >> against the PCI devices, it has to match the vendor_id and product_id
> >> against all the available PCI devices until one is found. The name is
> >> only used for reference in the extra_specs. On the other hand, the
> >> whitelist is basically the same as the alias without a name.
> >>
> >> What we have discussed so far is based on something called PCI groups
> >> (or PCI flavors as Yongli puts it). Without introducing other
> >> complexities, and with a little change of the above representation,
> >> we will have something
> >> like:
> >>
> >> pci_passthrough_whitelist=[{ "vendor_id":"xxxx","product_id":"xxxx",
> >> "name":"str"}]
> >>
> >> By doing so, we eliminated the PCI alias. And we call the "name" in
> >> above as a PCI group name. You can think of it as combining the
> >> definitions of the existing whitelist and PCI alias. And believe it
> >> or not, a PCI group is actually a PCI alias. However, with that
> >> change of thinking, a lot of benefits can be harvested:
> >>
> >>          * the implementation is significantly simplified
> >>          * provisioning is simplified by eliminating the PCI alias
> >>          * a compute node only needs to report stats with something
> like:
> >> PCI group name:count. A compute node processes all the PCI
> >> passthrough devices against the whitelist, and assign a PCI group
> >> based on the whitelist definition.
> >>          * on the controller, we may only need to define the PCI
> >> group names. if we use a nova api to define PCI groups (could be
> >> private or public, for example), one potential benefit, among other
> >> things (validation, etc),  they can be owned by the tenant that
> >> creates them. And thus a wholesale of PCI passthrough devices is also
> possible.
> >>          * scheduler only works with PCI group names.
> >>          * request for PCI passthrough device is based on PCI-group
> >>          * deployers can provision the cloud based on the PCI groups
> >>          * Particularly for SRIOV, deployers can design SRIOV PCI
> >> groups based on network connectivities.
> >>
> >> Further, to support SRIOV, we are saying that PCI group names not
> >> only can be used in the extra specs, it can also be used in the -nic
> >> option and the neutron commands. This allows the most flexibilities
> >> and functionalities afforded by SRIOV.
> >>
> >> Further, we are saying that we can define default PCI groups based on
> >> the PCI device's class.
> >>
> >> For vnic-type (or nic-type), we are saying that it defines the link
> >> characteristics of the nic that is attached to a VM: a nic that's
> >> connected to a virtual switch, a nic that is connected to a physical
> >> switch, or a nic that is connected to a physical switch, but has a
> >> host macvtap device in between. The actual names of the choices are
> >> not important here, and can be debated.
> >>
> >> I'm hoping that we can go over the above on Monday. But any comments
> >> are welcome by email.
> >>
> >> Thanks,
> >> Robert
> >>
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140113/a16d7fb6/attachment-0001.html>


More information about the OpenStack-dev mailing list