[openstack-dev] [nova] [neutron] PCI pass-through network support
Robert Li (baoli)
baoli at cisco.com
Mon Nov 11 22:33:15 UTC 2013
It will be difficult for me with 7-8PM UTC on Thursday. How about Monday 7-8pm UTC (or 6-7 pm UTC)? Both slots are available on the #openstack-meeting channel.
thanks,
Robert
On 11/11/13 11:34 AM, "Jiang, Yunhong" <yunhong.jiang at intel.com<mailto:yunhong.jiang at intel.com>> wrote:
Hi, Sandhya,
I’m at PST, so I’d prefer to option 3 (7-8 PM UTC), option 1 (2~3 PM UTC ) less preferred but works still (my 6 am ~ 7 am). option 2 does work for me.
Thanks
--jyh
From: Sandhya Dasu (sadasu) [mailto:sadasu at cisco.com]
Sent: Thursday, November 07, 2013 6:44 PM
To: OpenStack Development Mailing List (not for usage questions); Jiang, Yunhong; Robert Li (baoli); Irena Berezovsky; prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>; chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>; He, Yongli; Itzik Brown
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Hi,
The discussions during the summit were very productive. Now, we are ready to setup our IRC meeting.
Here are some slots that look like they might work for us.
1. Wed 2 – 3 pm UTC.
2. Thursday 12 – 1 pm UTC.
3. Thursday 7 – 8pm UTC.
Please vote.
Thanks,
Sandhya
From: Sandhya Dasu <sadasu at cisco.com<mailto:sadasu at cisco.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Tuesday, November 5, 2013 12:03 PM
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>, "Jiang, Yunhong" <yunhong.jiang at intel.com<mailto:yunhong.jiang at intel.com>>, "Robert Li (baoli)" <baoli at cisco.com<mailto:baoli at cisco.com>>, Irena Berezovsky <irenab at mellanox.com<mailto:irenab at mellanox.com>>, "prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>" <prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>>, "chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>" <chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>>, "He, Yongli" <yongli.he at intel.com<mailto:yongli.he at intel.com>>, Itzik Brown <ItzikB at mellanox.com<mailto:ItzikB at mellanox.com>>
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Just to clarify, the discussion is planned for 10 AM Wednesday morning at the developer's lounge.
Thanks,
Sandhya
From: Sandhya Dasu <sadasu at cisco.com<mailto:sadasu at cisco.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Tuesday, November 5, 2013 11:38 AM
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>, "Jiang, Yunhong" <yunhong.jiang at intel.com<mailto:yunhong.jiang at intel.com>>, "Robert Li (baoli)" <baoli at cisco.com<mailto:baoli at cisco.com>>, Irena Berezovsky <irenab at mellanox.com<mailto:irenab at mellanox.com>>, "prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>" <prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>>, "chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>" <chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>>, "He, Yongli" <yongli.he at intel.com<mailto:yongli.he at intel.com>>, Itzik Brown <ItzikB at mellanox.com<mailto:ItzikB at mellanox.com>>
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Hi,
We are planning to have a discussion at the developer's lounge tomorrow morning at 10:00 am. Please feel free to drop by if you are interested.
Thanks,
Sandhya
From: <Jiang>, Yunhong <yunhong.jiang at intel.com<mailto:yunhong.jiang at intel.com>>
Date: Thursday, October 31, 2013 6:21 PM
To: "Robert Li (baoli)" <baoli at cisco.com<mailto:baoli at cisco.com>>, Irena Berezovsky <irenab at mellanox.com<mailto:irenab at mellanox.com>>, "prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>" <prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>>, "chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>" <chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>>, "He, Yongli" <yongli.he at intel.com<mailto:yongli.he at intel.com>>, Itzik Brown <ItzikB at mellanox.com<mailto:ItzikB at mellanox.com>>
Cc: OpenStack Development Mailing List <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>, "Brian Bowen (brbowen)" <brbowen at cisco.com<mailto:brbowen at cisco.com>>, "Kyle Mestery (kmestery)" <kmestery at cisco.com<mailto:kmestery at cisco.com>>, Sandhya Dasu <sadasu at cisco.com<mailto:sadasu at cisco.com>>
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support
Robert, I think your change request for pci alias should be covered by the extra infor enhancement. https://blueprints.launchpad.net/nova/+spec/pci-extra-info and Yongli is working on it.
I’m not sure how the port profile is passed to the connected switch, is it a Cisco VMEFX specific method or libvirt method? Sorry I’m not well on network side.
--jyh
From: Robert Li (baoli) [mailto:baoli at cisco.com]
Sent: Wednesday, October 30, 2013 10:13 AM
To: Irena Berezovsky; Jiang, Yunhong; prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>; chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Hi,
Regarding physical network mapping, This is what I thought.
consider the following scenarios:
1. a compute node with SRIOV only interfaces attached to a physical network. the node is connected to one upstream switch
2. a compute node with both SRIOV interfaces and non-SRIOV interfaces attached to a physical network. the node is connected to one upstream switch
3. in addition to case 1 &2, a compute node may have multiple vNICs that are connected to different upstream switches.
CASE 1:
-- the mapping from a virtual network (in terms of neutron) to a physical network is actually done by binding a port profile to a neutron port. With cisco's VM-FEX, a port profile is associated with one or multiple vlans. Once the neutron port is bound with this port-profile in the upstream switch, it's effectively plugged into the physical network.
-- since the compute node is connected to one upstream switch, the existing nova PCI alias will be sufficient. For example, one can boot a Nova instance that is attached to a SRIOV port with the following command:
nova boot —flavor m1.large —image <image-id> --nic net-id=<net>,pci-alias=<alias>,sriov=<direct|macvtap>,port-profile=<profile>
the net-id will be useful in terms of allocating IP address, enable dhcp, etc that is associated with the network.
-- the pci-alias specified in the nova boot command is used to create a PCI request for scheduling purpose. a PCI device is bound to a neutron port during the instance build time in the case of nova boot. Before invoking the neutron API to create a port, an allocated PCI device out of a PCI alias will be located from the PCI device list object. This device info among other information will be sent to neutron to create the port.
CASE 2:
-- Assume that OVS is used for the non-SRIOV interfaces. An example of configuration with ovs plugin would look like:
bridge_mappings = physnet1:br-vmfex
network_vlan_ranges = physnet1:15:17
tenant_network_type = vlan
When a neutron network is created, a vlan is either allocated or specified in the neutron net-create command. Attaching a physical interface to the bridge (in the above example br-vmfex) is an administrative task.
-- to create a Nova instance with non-SRIOV port:
nova boot —flavor m1.large —image <image-id> --nic net-id=<net>
-- to create a Nova instance with SRIOV port:
nova boot —flavor m1.large —image <image-id> --nic net-id=<net>,pci-alias=<alias>,sriov=<direct|macvtap>,port-profile=<profile>
it's essentially the same as in the first case. But since the net-id is already associated with a vlan, the vlan associated with the port-profile must be identical to that vlan. This has to be enforced by neutron.
again, since the node is connected to one upstream switch, the existing nova PCI alias should be sufficient.
CASE 3:
-- A compute node might be connected to multiple upstream switches, with each being a separate network. This means SRIOV PFs/VFs are already implicitly associated with physical networks. In the none-SRIOV case, a physical interface is associated with a physical network by plugging it into that network, and attaching this interface to the ovs bridge that represents this physical network on the compute node. In the SRIOV case, we need a way to group the SRIOV VFs that belong to the same physical networks. The existing nova PCI alias is to facilitate PCI device allocation by associating <product_id, vendor_id> with an alias name. This will no longer be sufficient. But it can be enhanced to achieve our goal. For example, the PCI device domain, bus (if their mapping to vNIC is fixed across boot) may be added into the alias, and the alias name should be corresponding to a list of tuples.
Another consideration is that a VF or PF might be used on the host for other purposes. For example, it's possible for a neutron DHCP server to be bound with a VF. Therefore, there needs a method to exclude some VFs from a group. One way is to associate an exclude list with an alias.
The enhanced PCI alias can be used to support features other than neutron as well. Essentially, a PCI alias can be defined as a group of PCI devices associated with a feature. I'd think that this should be addressed with a separate blueprint.
Thanks,
Robert
On 10/30/13 12:59 AM, "Irena Berezovsky" <irenab at mellanox.com<mailto:irenab at mellanox.com>> wrote:
Hi,
Please see my answers inline
From: Jiang, Yunhong [mailto:yunhong.jiang at intel.com]
Sent: Tuesday, October 29, 2013 10:17 PM
To: Irena Berezovsky; Robert Li (baoli); prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>; chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support
Your explanation of the virtual network and physical network is quite clear and should work well. We need change nova code to achieve it, including get the physical network for the virtual network, passing the physical network requirement to the filter properties etc.
[IrenaB] The physical network is already available to nova at networking/nova/api at as virtual network attribute, it then passed to the VIF driver. We will push soon the fix to:https://bugs.launchpad.net/nova/+bug/1239606 ; which will provide general support for getting this information.
For your port method, so you mean we are sure to passing network id to ‘nova boot’ and nova will create the port during VM boot, am I right? Also, how can nova knows that it need allocate the PCI device for the port? I’d suppose that in SR-IOV NIC environment, user don’t need specify the PCI requirement. Instead, the PCI requirement should come from the network configuration and image property. Or you think user still need passing flavor with pci request?
[IrenaB] There are two way to apply port method. One is to pass network id on nova boot and use default type as chosen in the neutron config file for vnic type. Other way is to define port with required vnic type and other properties if applicable, and run ‘nova boot’ with port id argument. Going forward with nova support for PCI devices awareness, we do need a way impact scheduler choice to land VM on suitable Host with available PC device that has the required connectivity.
--jyh
From: Irena Berezovsky [mailto:irenab at mellanox.com]
Sent: Tuesday, October 29, 2013 3:17 AM
To: Jiang, Yunhong; Robert Li (baoli); prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>; chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support
Hi Jiang, Robert,
IRC meeting option works for me.
If I understand your question below, you are looking for a way to tie up between requested virtual network(s) and requested PCI device(s). The way we did it in our solution is to map a provider:physical_network to an interface that represents the Physical Function. Every virtual network is bound to the provider:physical_network, so the PCI device should be allocated based on this mapping. We can map a PCI alias to the provider:physical_network.
Another topic to discuss is where the mapping between neutron port and PCI device should be managed. One way to solve it, is to propagate the allocated PCI device details to neutron on port creation.
In case there is no qbg/qbh support, VF networking configuration should be applied locally on the Host.
The question is when and how to apply networking configuration on the PCI device?
We see the following options:
· it can be done on port creation.
· It can be done when nova VIF driver is called for vNIC plugging. This will require to have all networking configuration available to the VIF driver or send request to the neutron server to obtain it.
· It can be done by having a dedicated L2 neutron agent on each Host that scans for allocated PCI devices and then retrieves networking configuration from the server and configures the device. The agent will be also responsible for managing update requests coming from the neutron server.
For macvtap vNIC type assignment, the networking configuration can be applied by a dedicated L2 neutron agent.
BR,
Irena
From: Jiang, Yunhong [mailto:yunhong.jiang at intel.com]
Sent: Tuesday, October 29, 2013 9:04 AM
To: Robert Li (baoli); Irena Berezovsky; prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>; chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support
Robert, is it possible to have a IRC meeting? I’d prefer to IRC meeting because it’s more openstack style and also can keep the minutes clearly.
To your flow, can you give more detailed example. For example, I can consider user specify the instance with –nic option specify a network id, and then how nova device the requirement to the PCI device? I assume the network id should define the switches that the device can connect to , but how is that information translated to the PCI property requirement? Will this translation happen before the nova scheduler make host decision?
Thanks
--jyh
From: Robert Li (baoli) [mailto:baoli at cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>; Jiang, Yunhong; chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Hi Irena,
Thank you very much for your comments. See inline.
--Robert
On 10/27/13 3:48 AM, "Irena Berezovsky" <irenab at mellanox.com<mailto:irenab at mellanox.com>> wrote:
Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you please share your idea of the end to end flow? How do you suggest to bind Nova and Neutron?
The end to end flow is actually encompassed in the blueprints in a nutshell. I will reiterate it in below. The binding between Nova and Neutron occurs with the neutron v2 API that nova invokes in order to provision the neutron services. The vif driver is responsible for plugging in an instance onto the networking setup that neutron has created on the host.
Normally, one will invoke "nova boot" api with the —nic options to specify the nic with which the instance will be connected to the network. It currently allows net-id, fixed ip and/or port-id to be specified for the option. However, it doesn't allow one to specify special networking requirements for the instance. Thanks to the nova pci-passthrough work, one can specify PCI passthrough device(s) in the nova flavor. But it doesn't provide means to tie up these PCI devices in the case of ethernet adpators with networking services. Therefore the idea is actually simple as indicated by the blueprint titles, to provide means to tie up SRIOV devices with neutron services. A work flow would roughly look like this for 'nova boot':
-- Specifies networking requirements in the —nic option. Specifically for SRIOV, allow the following to be specified in addition to the existing required information:
. PCI alias
. direct pci-passthrough/macvtap
. port profileid that is compliant with 802.1Qbh
The above information is optional. In the absence of them, the existing behavior remains.
-- if special networking requirements exist, Nova api creates PCI requests in the nova instance type for scheduling purpose
-- Nova scheduler schedules the instance based on the requested flavor plus the PCI requests that are created for networking.
-- Nova compute invokes neutron services with PCI passthrough information if any
-- Neutron performs its normal operations based on the request, such as allocating a port, assigning ip addresses, etc. Specific to SRIOV, it should validate the information such as profileid, and stores them in its db. It's also possible to associate a port profileid with a neutron network so that port profileid becomes optional in the —nic option. Neutron returns nova the port information, especially for PCI passthrough related information in the port binding object. Currently, the port binding object contains the following information:
binding:vif_type
binding:host_id
binding:profile
binding:capabilities
-- nova constructs the domain xml and plug in the instance by calling the vif driver. The vif driver can build up the interface xml based on the port binding information.
The blueprints you registered make sense. On Nova side, there is a need to bind between requested virtual network and PCI device/interface to be allocated as vNIC.
On the Neutron side, there is a need to support networking configuration of the vNIC. Neutron should be able to identify the PCI device/macvtap interface in order to apply configuration. I think it makes sense to provide neutron integration via dedicated Modular Layer 2 Mechanism Driver to allow PCI pass-through vNIC support along with other networking technologies.
I haven't sorted through this yet. A neutron port could be associated with a PCI device or not, which is a common feature, IMHO. However, a ML2 driver may be needed specific to a particular SRIOV technology.
During the Havana Release, we introduced Mellanox Neutron plugin that enables networking via SRIOV pass-through devices or macvtap interfaces.
We want to integrate our solution with PCI pass-through Nova support. I will be glad to share more details if you are interested.
Good to know that you already have a SRIOV implementation. I found out some information online about the mlnx plugin, but need more time to get to know it better. And certainly I'm interested in knowing its details.
The PCI pass-through networking support is planned to be discussed during the summit: http://summit.openstack.org/cfp/details/129. I think it’s worth to drill down into more detailed proposal and present it during the summit, especially since it impacts both nova and neutron projects.
I agree. Maybe we can steal some time in that discussion.
Would you be interested in collaboration on this effort? Would you be interested to exchange more emails or set an IRC/WebEx meeting during this week before the summit?
Sure. If folks want to discuss it before the summit, we can schedule a webex later this week. Or otherwise, we can continue the discussion with email.
Regards,
Irena
From: Robert Li (baoli) [mailto:baoli at cisco.com]
Sent: Friday, October 25, 2013 11:16 PM
To: prashant.upadhyaya at aricent.com<mailto:prashant.upadhyaya at aricent.com>; Irena Berezovsky; yunhong.jiang at intel.com<mailto:yunhong.jiang at intel.com>; chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>; yongli.he at intel.com<mailto:yongli.he at intel.com>
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Hi Irena,
This is Robert Li from Cisco Systems. Recently, I was tasked to investigate such support for Cisco's systems that support VM-FEX, which is a SRIOV technology supporting 802-1Qbh. I was able to bring up nova instances with SRIOV interfaces, and establish networking in between the instances that employes the SRIOV interfaces. Certainly, this was accomplished with hacking and some manual intervention. Based on this experience and my study with the two existing nova pci-passthrough blueprints that have been implemented and committed into Havana (https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt), I registered a couple of blueprints (one on Nova side, the other on the Neutron side):
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov
in order to address SRIOV support in openstack.
Please take a look at them and see if they make sense, and let me know any comments and questions. We can also discuss this in the summit, I suppose.
I noticed that there is another thread on this topic, so copy those folks from that thread as well.
thanks,
Robert
On 10/16/13 4:32 PM, "Irena Berezovsky" <irenab at mellanox.com<mailto:irenab at mellanox.com>> wrote:
Hi,
As one of the next steps for PCI pass-through I would like to discuss is the support for PCI pass-through vNIC.
While nova takes care of PCI pass-through device resources management and VIF settings, neutron should manage their networking configuration.
I would like to register asummit proposal to discuss the support for PCI pass-through networking.
I am not sure what would be the right topic to discuss the PCI pass-through networking, since it involve both nova and neutron.
There is already a session registered by Yongli on nova topic to discuss the PCI pass-through next steps.
I think PCI pass-through networking is quite a big topic and it worth to have a separate discussion.
Is there any other people who are interested to discuss it and share their thoughts and experience?
Regards,
Irena
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131111/e7b28e9e/attachment.html>
More information about the OpenStack-dev
mailing list