[openstack-dev] [nova] [neutron] PCI pass-through network support

Henry Gessau gessau at cisco.com
Tue Oct 29 21:23:29 UTC 2013


On Tue, Oct 29, at 4:31 pm, Jiang, Yunhong <yunhong.jiang at intel.com> wrote:

> Henry,why do you think the "service VM" need the entire PF instead of a
> VF? I think the SR-IOV NIC should provide QoS and performance isolation.

I was speculating. I just thought it might be a good idea to leave open the
possibility of assigning a PF to a VM if the need arises.

Neutron service VMs are a new thing. I will be following the discussions and
there is a summit session for them. It remains to be seen if there is any
desire/need for full PF ownership of NICs. But if a service VM owns the PF
and has the right NIC driver it could do some advanced features with it.

> As to assign entire PCI device to a guest, that should be ok since
> usually PF and VF has different device ID, the tricky thing is, at least
> for some PCI devices, you can't configure that some NIC will have SR-IOV
> enabled while others not.

Thanks for the warning. :) Perhaps the cloud admin might plug in an extra
NIC in just a few nodes (one or two per rack, maybe) for the purpose of
running service VMs there. Again, just speculating. I don't know how hard it
is to manage non-homogenous nodes.

> 
> Thanks
> --jyh
> 
>> -----Original Message-----
>> From: Henry Gessau [mailto:gessau at cisco.com]
>> Sent: Tuesday, October 29, 2013 8:10 AM
>> To: OpenStack Development Mailing List (not for usage questions)
>> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>> 
>> Lots of great info and discussion going on here.
>> 
>> One additional thing I would like to mention is regarding PF and VF usage.
>> 
>> Normally VFs will be assigned to instances, and the PF will either not be
>> used at all, or maybe some agent in the host of the compute node might
>> have
>> access to the PF for something (management?).
>> 
>> There is a neutron design track around the development of "service VMs".
>> These are dedicated instances that run neutron services like routers,
>> firewalls, etc. It is plausible that a service VM would like to use PCI
>> passthrough and get the entire PF. This would allow it to have complete
>> control over a physical link, which I think will be wanted in some cases.
>> 
>> --
>> Henry
>> 
>> On Tue, Oct 29, at 10:23 am, Irena Berezovsky <irenab at mellanox.com>
>> wrote:
>> 
>> > Hi,
>> >
>> > I would like to share some details regarding the support provided by
>> > Mellanox plugin. It enables networking via SRIOV pass-through devices
>> or
>> > macvtap interfaces.  It plugin is available here:
>> >
>> https://github.com/openstack/neutron/tree/master/neutron/plugins/mln
>> x.
>> >
>> > To support either PCI pass-through device and macvtap interface type of
>> > vNICs, we set neutron port profile:vnic_type according to the required
>> VIF
>> > type and then use the created port to 'nova boot' the VM.
>> >
>> > To  overcome the missing scheduler awareness for PCI devices which
>> was not
>> > part of the Havana release yet, we
>> >
>> > have an additional service (embedded switch Daemon) that runs on each
>> > compute node.
>> >
>> > This service manages the SRIOV resources allocation,  answers vNICs
>> > discovery queries and applies VLAN/MAC configuration using standard
>> Linux
>> > APIs (code is here:
>> https://github.com/mellanox-openstack/mellanox-eswitchd
>> > ).  The embedded switch Daemon serves as a glue layer between VIF
>> Driver and
>> > Neutron Agent.
>> >
>> > In the Icehouse Release when SRIOV resources allocation is already part
>> of
>> > the Nova, we plan to eliminate the need in embedded switch daemon
>> service.
>> > So what is left to figure out is how to tie up between neutron port and
>> PCI
>> > device and invoke networking configuration.
>> >
>> >
>> >
>> > In our case what we have is actually the Hardware VEB that is not
>> programmed
>> > via either 802.1Qbg or 802.1Qbh, but configured locally by Neutron
>> Agent. We
>> > also support both Ethernet and InfiniBand physical network L2
>> technology.
>> > This means that we apply different configuration commands  to set
>> > configuration on VF.
>> >
>> >
>> >
>> > I guess what we have to figure out is how to support the generic case for
>> > the PCI device networking support, for HW VEB, 802.1Qbg and
>> 802.1Qbh cases.
>> >
>> >
>> >
>> > BR,
>> >
>> > Irena
>> >
>> >
>> >
>> > *From:*Robert Li (baoli) [mailto:baoli at cisco.com]
>> > *Sent:* Tuesday, October 29, 2013 3:31 PM
>> > *To:* Jiang, Yunhong; Irena Berezovsky;
>> prashant.upadhyaya at aricent.com;
>> > chris.friesen at windriver.com; He, Yongli; Itzik Brown
>> > *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen);
>> Kyle
>> > Mestery (kmestery); Sandhya Dasu (sadasu)
>> > *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through
>> network support
>> >
>> >
>> >
>> > Hi Yunhong,
>> >
>> >
>> >
>> > I haven't looked at Mellanox in much detail. I think that we'll get more
>> > details from Irena down the road. Regarding your question, I can only
>> answer
>> > based on my experience with Cisco's VM-FEX. In a nutshell:
>> >
>> >      -- a vNIC is connected to an external switch. Once the host is
>> booted
>> > up, all the PFs and VFs provisioned on the vNIC will be created, as well as
>> > all the corresponding ethernet interfaces .
>> >
>> >      -- As far as Neutron is concerned, a neutron port can be
>> associated
>> > with a VF. One way to do so is to specify this requirement in the -nic
>> > option, providing information such as:
>> >
>> >                . PCI alias (this is the same alias as defined in your nova
>> > blueprints)
>> >
>> >                . direct pci-passthrough/macvtap
>> >
>> >                . port profileid that is compliant with 802.1Qbh
>> >
>> >      -- similar to how you translate the nova flavor with PCI
>> requirements
>> > to PCI requests for scheduling purpose, Nova API (the nova api
>> component)
>> > can translate the above to PCI requests for scheduling purpose. I can
>> give
>> > more detail later on this.
>> >
>> >
>> >
>> > Regarding your last question, since the vNIC is already connected with
>> the
>> > external switch, the vNIC driver will be responsible for communicating
>> the
>> > port profile to the external switch. As you have already known, libvirt
>> > provides several ways to specify a VM to be booted up with SRIOV. For
>> > example, in the following interface definition:
>> >
>> >
>> >
>> >   *<interface type='hostdev' managed='yes'>*
>> >
>> > *      <source>*
>> >
>> > *        <address type='pci' domain='0' bus='0x09' slot='0x0'
>> function='0x01'/>*
>> >
>> > *      </source>*
>> >
>> > *      <mac address='01:23:45:67:89:ab' />*
>> >
>> > *      <virtualport type='802.1Qbh'>*
>> >
>> > *        <parameters profileid='my-port-profile' />*
>> >
>> > *      </virtualport>*
>> >
>> > *    </interface>*
>> >
>> >
>> >
>> > The SRIOV VF (bus 0x09, VF 0x01) will be allocated, and the port profile
>> 'my-port-profile' will be used to provision this VF. Libvirt will be
>> responsible for invoking the vNIC driver to configure this VF with the port
>> profile my-port-porfile. The driver will talk to the external switch using the
>> 802.1qbh standards to complete the VF's configuration and binding with
>> the VM.
>> >
>> >
>> >
>> > Now that nova PCI passthrough is responsible for
>> discovering/scheduling/allocating a VF, the rest of the puzzle is to associate
>> this PCI device with the feature that's going to use it, and the feature will
>> be responsible for configuring it. You can also see from the above example,
>> in one implementation of SRIOV, the feature (in this case neutron) may not
>> need to do much in terms of working with the external switch, the work is
>> actually done by libvirt behind the scene.
>> >
>> >
>> >
>> > Now the questions are:
>> >
>> >         -- how the port profile gets defined/managed
>> >
>> >         -- how the port profile gets associated with a neutron network
>> >
>> > The first question will be specific to the particular product, and
>> therefore a particular neutron plugin has to mange that.
>> >
>> > There may be several approaches to address the second question. For
>> example, in the simplest case, a port profile can be associated with a
>> neutron network. This has some significant drawbacks. Since the port
>> profile defines features for all the ports that use it, the one port profile to
>> one neutron network mapping would mean all the ports on the network
>> will have exactly the same features (for example, QoS characteristics). To
>> make it flexible, the binding of a port profile to a port may be done at the
>> port creation time.
>> >
>> >
>> >
>> > Let me know if the above answered your question.
>> >
>> >
>> >
>> > thanks,
>> >
>> > Robert
>> >
>> >
>> >
>> > On 10/29/13 3:03 AM, "Jiang, Yunhong" <yunhong.jiang at intel.com
>> > <mailto:yunhong.jiang at intel.com>> wrote:
>> >
>> >
>> >
>> >     Robert, is it possible to have a IRC meeting? I'd prefer to IRC
>> meeting
>> >     because it's more openstack style and also can keep the minutes
>> clearly.
>> >
>> >
>> >
>> >     To your flow, can you give more detailed example. For example, I
>> can
>> >     consider user specify the instance with -nic option specify a
>> network
>> >     id, and then how nova device the requirement to the PCI device? I
>> assume
>> >     the network id should define the switches that the device can
>> connect to
>> >     , but how is that information translated to the PCI property
>> >     requirement? Will this translation happen before the nova
>> scheduler make
>> >     host decision?
>> >
>> >
>> >
>> >     Thanks
>> >
>> >     --jyh
>> >
>> >
>> >
>> >     *From:*Robert Li (baoli) [mailto:baoli at cisco.com]
>> >     *Sent:* Monday, October 28, 2013 12:22 PM
>> >     *To:* Irena Berezovsky; prashant.upadhyaya at aricent.com
>> >     <mailto:prashant.upadhyaya at aricent.com>; Jiang, Yunhong;
>> >     chris.friesen at windriver.com
>> <mailto:chris.friesen at windriver.com>; He,
>> >     Yongli; Itzik Brown
>> >     *Cc:* OpenStack Development Mailing List; Brian Bowen
>> (brbowen); Kyle
>> >     Mestery (kmestery); Sandhya Dasu (sadasu)
>> >     *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through
>> network
>> >     support
>> >
>> >
>> >
>> >     Hi Irena,
>> >
>> >
>> >
>> >     Thank you very much for your comments. See inline.
>> >
>> >
>> >
>> >     --Robert
>> >
>> >
>> >
>> >     On 10/27/13 3:48 AM, "Irena Berezovsky" <irenab at mellanox.com
>> >     <mailto:irenab at mellanox.com>> wrote:
>> >
>> >
>> >
>> >         Hi Robert,
>> >
>> >         Thank you very much for sharing the information regarding
>> your
>> >         efforts. Can you please share your idea of the end to end flow?
>> How
>> >         do you suggest  to bind Nova and Neutron?
>> >
>> >
>> >
>> >     The end to end flow is actually encompassed in the blueprints in a
>> >     nutshell. I will reiterate it in below. The binding between Nova and
>> >     Neutron occurs with the neutron v2 API that nova invokes in order
>> to
>> >     provision the neutron services. The vif driver is responsible for
>> >     plugging in an instance onto the networking setup that neutron has
>> >     created on the host.
>> >
>> >
>> >
>> >     Normally, one will invoke "nova boot" api with the -nic options to
>> >     specify the nic with which the instance will be connected to the
>> >     network. It currently allows net-id, fixed ip and/or port-id to be
>> >     specified for the option. However, it doesn't allow one to specify
>> >     special networking requirements for the instance. Thanks to the
>> nova
>> >     pci-passthrough work, one can specify PCI passthrough device(s) in
>> the
>> >     nova flavor. But it doesn't provide means to tie up these PCI devices
>> in
>> >     the case of ethernet adpators with networking services. Therefore
>> the
>> >     idea is actually simple as indicated by the blueprint titles, to provide
>> >     means to tie up SRIOV devices with neutron services. A work flow
>> would
>> >     roughly look like this for 'nova boot':
>> >
>> >
>> >
>> >           -- Specifies networking requirements in the -nic option.
>> >     Specifically for SRIOV, allow the following to be specified in addition
>> >     to the existing required information:
>> >
>> >                    . PCI alias
>> >
>> >                    . direct pci-passthrough/macvtap
>> >
>> >                    . port profileid that is compliant with 802.1Qbh
>> >
>> >
>> >
>> >             The above information is optional. In the absence of them,
>> the
>> >     existing behavior remains.
>> >
>> >
>> >
>> >          -- if special networking requirements exist, Nova api creates
>> PCI
>> >     requests in the nova instance type for scheduling purpose
>> >
>> >
>> >
>> >          -- Nova scheduler schedules the instance based on the
>> requested
>> >     flavor plus the PCI requests that are created for networking.
>> >
>> >
>> >
>> >          -- Nova compute invokes neutron services with PCI
>> passthrough
>> >     information if any
>> >
>> >
>> >
>> >          --  Neutron performs its normal operations based on the
>> request,
>> >     such as allocating a port, assigning ip addresses, etc. Specific to
>> >     SRIOV, it should validate the information such as profileid, and
>> stores
>> >     them in its db. It's also possible to associate a port profileid with a
>> >     neutron network so that port profileid becomes optional in the
>> -nic
>> >     option. Neutron returns  nova the port information, especially for
>> PCI
>> >     passthrough related information in the port binding object.
>> Currently,
>> >     the port binding object contains the following information:
>> >
>> >               binding:vif_type
>> >
>> >               binding:host_id
>> >
>> >               binding:profile
>> >
>> >               binding:capabilities
>> >
>> >
>> >
>> >         -- nova constructs the domain xml and plug in the instance by
>> >     calling the vif driver. The vif driver can build up the interface xml
>> >     based on the port binding information.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >         The blueprints you registered make sense. On Nova side, there
>> is a
>> >         need to bind between requested virtual network and PCI
>> >         device/interface to be allocated as vNIC.
>> >
>> >         On the Neutron side, there is a need to  support networking
>> >         configuration of the vNIC. Neutron should be able to identify
>> the
>> >         PCI device/macvtap interface in order to apply configuration. I
>> >         think it makes sense to provide neutron integration via
>> dedicated
>> >         Modular Layer 2 Mechanism Driver to allow PCI pass-through
>> vNIC
>> >         support along with other networking technologies.
>> >
>> >
>> >
>> >     I haven't sorted through this yet. A neutron port could be
>> associated
>> >     with a PCI device or not, which is a common feature, IMHO.
>> However, a
>> >     ML2 driver may be needed specific to a particular SRIOV
>> technology.
>> >
>> >
>> >
>> >
>> >
>> >         During the Havana Release, we introduced Mellanox Neutron
>> plugin
>> >         that enables networking via SRIOV pass-through devices or
>> macvtap
>> >         interfaces.
>> >
>> >         We want to integrate our solution with PCI pass-through Nova
>> >         support.  I will be glad to share more details if you are
>> interested.
>> >
>> >
>> >
>> >
>> >
>> >     Good to know that you already have a SRIOV implementation. I
>> found out
>> >     some information online about the mlnx plugin, but need more
>> time to get
>> >     to know it better. And certainly I'm interested in knowing its details.
>> >
>> >
>> >
>> >         The PCI pass-through networking support is planned to be
>> discussed
>> >         during the summit:
>> http://summit.openstack.org/cfp/details/129. I
>> >         think it's worth to drill down into more detailed proposal and
>> >         present it during the summit, especially since it impacts both
>> nova
>> >         and neutron projects.
>> >
>> >
>> >
>> >     I agree. Maybe we can steal some time in that discussion.
>> >
>> >
>> >
>> >         Would you be interested in collaboration on this effort? Would
>> you
>> >         be interested to exchange more emails or set an IRC/WebEx
>> meeting
>> >         during this week before the summit?
>> >
>> >
>> >
>> >     Sure. If folks want to discuss it before the summit, we can schedule
>> a
>> >     webex later this week. Or otherwise, we can continue the
>> discussion with
>> >     email.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >         Regards,
>> >
>> >         Irena
>> >
>> >
>> >
>> >         *From:*Robert Li (baoli) [mailto:baoli at cisco.com]
>> >         *Sent:* Friday, October 25, 2013 11:16 PM
>> >         *To:* prashant.upadhyaya at aricent.com
>> >         <mailto:prashant.upadhyaya at aricent.com>; Irena Berezovsky;
>> >         yunhong.jiang at intel.com <mailto:yunhong.jiang at intel.com>;
>> >         chris.friesen at windriver.com
>> <mailto:chris.friesen at windriver.com>;
>> >         yongli.he at intel.com <mailto:yongli.he at intel.com>
>> >         *Cc:* OpenStack Development Mailing List; Brian Bowen
>> (brbowen);
>> >         Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
>> >         *Subject:* Re: [openstack-dev] [nova] [neutron] PCI
>> pass-through
>> >         network support
>> >
>> >
>> >
>> >         Hi Irena,
>> >
>> >
>> >
>> >         This is Robert Li from Cisco Systems. Recently, I was tasked to
>> >         investigate such support for Cisco's systems that support
>> VM-FEX,
>> >         which is a SRIOV technology supporting 802-1Qbh. I was able to
>> bring
>> >         up nova instances with SRIOV interfaces, and establish
>> networking in
>> >         between the instances that employes the SRIOV interfaces.
>> Certainly,
>> >         this was accomplished with hacking and some manual
>> intervention.
>> >         Based on this experience and my study with the two existing
>> nova
>> >         pci-passthrough blueprints that have been implemented and
>> committed
>> >         into Havana
>> >
>> (https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
>> >
>> https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt),  I
>> >         registered a couple of blueprints (one on Nova side, the other
>> on
>> >         the Neutron side):
>> >
>> >
>> >
>> >
>> https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
>> >
>> >
>> https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov
>> >
>> >
>> >
>> >         in order to address SRIOV support in openstack.
>> >
>> >
>> >
>> >         Please take a look at them and see if they make sense, and let
>> me
>> >         know any comments and questions. We can also discuss this in
>> the
>> >         summit, I suppose.
>> >
>> >
>> >
>> >         I noticed that there is another thread on this topic, so copy
>> those
>> >         folks  from that thread as well.
>> >
>> >
>> >
>> >         thanks,
>> >
>> >         Robert
>> >
>> >
>> >
>> >         On 10/16/13 4:32 PM, "Irena Berezovsky"
>> <irenab at mellanox.com
>> >         <mailto:irenab at mellanox.com>> wrote:
>> >
>> >
>> >
>> >             Hi,
>> >
>> >             As one of the next steps for PCI pass-through I would like
>> to
>> >             discuss is the support for PCI pass-through vNIC.
>> >
>> >             While nova takes care of PCI pass-through device
>> resources
>> >              management and VIF settings, neutron should manage
>> their
>> >             networking configuration.
>> >
>> >             I would like to register asummit proposal to discuss the
>> support
>> >             for PCI pass-through networking.
>> >
>> >             I am not sure what would be the right topic to discuss the
>> PCI
>> >             pass-through networking, since it involve both nova and
>> neutron.
>> >
>> >             There is already a session registered by Yongli on nova
>> topic to
>> >             discuss the PCI pass-through next steps.
>> >
>> >             I think PCI pass-through networking is quite a big topic and
>> it
>> >             worth to have a separate discussion.
>> >
>> >             Is there any other people who are interested to discuss it
>> and
>> >             share their thoughts and experience?
>> >
>> >
>> >
>> >             Regards,
>> >
>> >             Irena
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > OpenStack-dev mailing list
>> > OpenStack-dev at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list