[openstack-dev] [nova] [neutron] PCI pass-through network support
Irena Berezovsky
irenab at mellanox.com
Tue Oct 29 14:23:14 UTC 2013
Hi,
I would like to share some details regarding the support provided by Mellanox plugin. It enables networking via SRIOV pass-through devices or macvtap interfaces. It plugin is available here: https://github.com/openstack/neutron/tree/master/neutron/plugins/mlnx.
To support either PCI pass-through device and macvtap interface type of vNICs, we set neutron port profile:vnic_type according to the required VIF type and then use the created port to 'nova boot' the VM.
To overcome the missing scheduler awareness for PCI devices which was not part of the Havana release yet, we
have an additional service (embedded switch Daemon) that runs on each compute node.
This service manages the SRIOV resources allocation, answers vNICs discovery queries and applies VLAN/MAC configuration using standard Linux APIs (code is here: https://github.com/mellanox-openstack/mellanox-eswitchd ). The embedded switch Daemon serves as a glue layer between VIF Driver and Neutron Agent.
In the Icehouse Release when SRIOV resources allocation is already part of the Nova, we plan to eliminate the need in embedded switch daemon service. So what is left to figure out is how to tie up between neutron port and PCI device and invoke networking configuration.
In our case what we have is actually the Hardware VEB that is not programmed via either 802.1Qbg or 802.1Qbh, but configured locally by Neutron Agent. We also support both Ethernet and InfiniBand physical network L2 technology. This means that we apply different configuration commands to set configuration on VF.
I guess what we have to figure out is how to support the generic case for the PCI device networking support, for HW VEB, 802.1Qbg and 802.1Qbh cases.
BR,
Irena
From: Robert Li (baoli) [mailto:baoli at cisco.com]
Sent: Tuesday, October 29, 2013 3:31 PM
To: Jiang, Yunhong; Irena Berezovsky; prashant.upadhyaya at aricent.com; chris.friesen at windriver.com; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Hi Yunhong,
I haven't looked at Mellanox in much detail. I think that we'll get more details from Irena down the road. Regarding your question, I can only answer based on my experience with Cisco's VM-FEX. In a nutshell:
-- a vNIC is connected to an external switch. Once the host is booted up, all the PFs and VFs provisioned on the vNIC will be created, as well as all the corresponding ethernet interfaces .
-- As far as Neutron is concerned, a neutron port can be associated with a VF. One way to do so is to specify this requirement in the -nic option, providing information such as:
. PCI alias (this is the same alias as defined in your nova blueprints)
. direct pci-passthrough/macvtap
. port profileid that is compliant with 802.1Qbh
-- similar to how you translate the nova flavor with PCI requirements to PCI requests for scheduling purpose, Nova API (the nova api component) can translate the above to PCI requests for scheduling purpose. I can give more detail later on this.
Regarding your last question, since the vNIC is already connected with the external switch, the vNIC driver will be responsible for communicating the port profile to the external switch. As you have already known, libvirt provides several ways to specify a VM to be booted up with SRIOV. For example, in the following interface definition:
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0' bus='0x09' slot='0x0' function='0x01'/>
</source>
<mac address='01:23:45:67:89:ab' />
<virtualport type='802.1Qbh'>
<parameters profileid='my-port-profile' />
</virtualport>
</interface>
The SRIOV VF (bus 0x09, VF 0x01) will be allocated, and the port profile 'my-port-profile' will be used to provision this VF. Libvirt will be responsible for invoking the vNIC driver to configure this VF with the port profile my-port-porfile. The driver will talk to the external switch using the 802.1qbh standards to complete the VF's configuration and binding with the VM.
Now that nova PCI passthrough is responsible for discovering/scheduling/allocating a VF, the rest of the puzzle is to associate this PCI device with the feature that's going to use it, and the feature will be responsible for configuring it. You can also see from the above example, in one implementation of SRIOV, the feature (in this case neutron) may not need to do much in terms of working with the external switch, the work is actually done by libvirt behind the scene.
Now the questions are:
-- how the port profile gets defined/managed
-- how the port profile gets associated with a neutron network
The first question will be specific to the particular product, and therefore a particular neutron plugin has to mange that.
There may be several approaches to address the second question. For example, in the simplest case, a port profile can be associated with a neutron network. This has some significant drawbacks. Since the port profile defines features for all the ports that use it, the one port profile to one neutron network mapping would mean all the ports on the network will have exactly the same features (for example, QoS characteristics). To make it flexible, the binding of a port profile to a port may be done at the port creation time.
Let me know if the above answered your question.
thanks,
Robert
On 10/29/13 3:03 AM, "Jiang, Yunhong" <yunhong.jiang at intel.com<mailto:yunhong.jiang at intel.com>> wrote:
Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting because it's more openstack style and also can keep the minutes clearly.
To your flow, can you give more detailed example. For example, I can consider user specify the instance with -nic option specify a network id, and then how nova device the requirement to the PCI device? I assume the network id should define the switches that the device can connect to , but how is that information translated to the PCI property requirement? Will this translation happen before the nova scheduler make host decision?
Thanks
--jyh
From: Robert Li (baoli) [mailto:baoli at cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>; Jiang, Yunhong; chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Hi Irena,
Thank you very much for your comments. See inline.
--Robert
On 10/27/13 3:48 AM, "Irena Berezovsky" <irenab at mellanox.com<mailto:irenab at mellanox.com>> wrote:
Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you please share your idea of the end to end flow? How do you suggest to bind Nova and Neutron?
The end to end flow is actually encompassed in the blueprints in a nutshell. I will reiterate it in below. The binding between Nova and Neutron occurs with the neutron v2 API that nova invokes in order to provision the neutron services. The vif driver is responsible for plugging in an instance onto the networking setup that neutron has created on the host.
Normally, one will invoke "nova boot" api with the -nic options to specify the nic with which the instance will be connected to the network. It currently allows net-id, fixed ip and/or port-id to be specified for the option. However, it doesn't allow one to specify special networking requirements for the instance. Thanks to the nova pci-passthrough work, one can specify PCI passthrough device(s) in the nova flavor. But it doesn't provide means to tie up these PCI devices in the case of ethernet adpators with networking services. Therefore the idea is actually simple as indicated by the blueprint titles, to provide means to tie up SRIOV devices with neutron services. A work flow would roughly look like this for 'nova boot':
-- Specifies networking requirements in the -nic option. Specifically for SRIOV, allow the following to be specified in addition to the existing required information:
. PCI alias
. direct pci-passthrough/macvtap
. port profileid that is compliant with 802.1Qbh
The above information is optional. In the absence of them, the existing behavior remains.
-- if special networking requirements exist, Nova api creates PCI requests in the nova instance type for scheduling purpose
-- Nova scheduler schedules the instance based on the requested flavor plus the PCI requests that are created for networking.
-- Nova compute invokes neutron services with PCI passthrough information if any
-- Neutron performs its normal operations based on the request, such as allocating a port, assigning ip addresses, etc. Specific to SRIOV, it should validate the information such as profileid, and stores them in its db. It's also possible to associate a port profileid with a neutron network so that port profileid becomes optional in the -nic option. Neutron returns nova the port information, especially for PCI passthrough related information in the port binding object. Currently, the port binding object contains the following information:
binding:vif_type
binding:host_id
binding:profile
binding:capabilities
-- nova constructs the domain xml and plug in the instance by calling the vif driver. The vif driver can build up the interface xml based on the port binding information.
The blueprints you registered make sense. On Nova side, there is a need to bind between requested virtual network and PCI device/interface to be allocated as vNIC.
On the Neutron side, there is a need to support networking configuration of the vNIC. Neutron should be able to identify the PCI device/macvtap interface in order to apply configuration. I think it makes sense to provide neutron integration via dedicated Modular Layer 2 Mechanism Driver to allow PCI pass-through vNIC support along with other networking technologies.
I haven't sorted through this yet. A neutron port could be associated with a PCI device or not, which is a common feature, IMHO. However, a ML2 driver may be needed specific to a particular SRIOV technology.
During the Havana Release, we introduced Mellanox Neutron plugin that enables networking via SRIOV pass-through devices or macvtap interfaces.
We want to integrate our solution with PCI pass-through Nova support. I will be glad to share more details if you are interested.
Good to know that you already have a SRIOV implementation. I found out some information online about the mlnx plugin, but need more time to get to know it better. And certainly I'm interested in knowing its details.
The PCI pass-through networking support is planned to be discussed during the summit: http://summit.openstack.org/cfp/details/129. I think it's worth to drill down into more detailed proposal and present it during the summit, especially since it impacts both nova and neutron projects.
I agree. Maybe we can steal some time in that discussion.
Would you be interested in collaboration on this effort? Would you be interested to exchange more emails or set an IRC/WebEx meeting during this week before the summit?
Sure. If folks want to discuss it before the summit, we can schedule a webex later this week. Or otherwise, we can continue the discussion with email.
Regards,
Irena
From: Robert Li (baoli) [mailto:baoli at cisco.com]
Sent: Friday, October 25, 2013 11:16 PM
To: prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>; Irena Berezovsky; yunhong.jiang at intel.com<mailto:yunhong.jiang at intel.com>; chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>; yongli.he at intel.com<mailto:yongli.he at intel.com>
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Hi Irena,
This is Robert Li from Cisco Systems. Recently, I was tasked to investigate such support for Cisco's systems that support VM-FEX, which is a SRIOV technology supporting 802-1Qbh. I was able to bring up nova instances with SRIOV interfaces, and establish networking in between the instances that employes the SRIOV interfaces. Certainly, this was accomplished with hacking and some manual intervention. Based on this experience and my study with the two existing nova pci-passthrough blueprints that have been implemented and committed into Havana (https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt), I registered a couple of blueprints (one on Nova side, the other on the Neutron side):
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov
in order to address SRIOV support in openstack.
Please take a look at them and see if they make sense, and let me know any comments and questions. We can also discuss this in the summit, I suppose.
I noticed that there is another thread on this topic, so copy those folks from that thread as well.
thanks,
Robert
On 10/16/13 4:32 PM, "Irena Berezovsky" <irenab at mellanox.com<mailto:irenab at mellanox.com>> wrote:
Hi,
As one of the next steps for PCI pass-through I would like to discuss is the support for PCI pass-through vNIC.
While nova takes care of PCI pass-through device resources management and VIF settings, neutron should manage their networking configuration.
I would like to register asummit proposal to discuss the support for PCI pass-through networking.
I am not sure what would be the right topic to discuss the PCI pass-through networking, since it involve both nova and neutron.
There is already a session registered by Yongli on nova topic to discuss the PCI pass-through next steps.
I think PCI pass-through networking is quite a big topic and it worth to have a separate discussion.
Is there any other people who are interested to discuss it and share their thoughts and experience?
Regards,
Irena
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131029/5b35eadc/attachment.html>
More information about the OpenStack-dev
mailing list