[openstack-dev] [nova] [neutron] PCI pass-through network support

Ian Wells ijw.ubuntu at cack.org.uk
Thu Dec 12 14:40:06 UTC 2013


... And there's a new section at the bottom for how I think the Nova side
of passthrough should work.  Again, completely without the API calls, which
I don't personally think are a good idea.  This is done in isolation of the
current mechanism, but please take it as blue-sky thinking and tell me
where it's not practical.
-- 
Ian.

On 12 December 2013 02:08, Ian Wells <ijw.ubuntu at cack.org.uk> wrote:

> I've had a set-to trying to define where I think we should go on the
> Neutron API.  There's a load of stuff in the document highlighted in blue
> in the Neutron section.  I've tried to explain both what I think we should
> do and also why I think we should do it that way, and I've also tried to
> keep it hypervisor, hardware and overlay agnostic.
>
> Now, on the allocation side of things, the solution I like best is:
>
> - PCI groups are groups of devices, arbitrarily selected by one or more
> PCI 'regular expressions'
> - the selection of devices is done in configuration on the individual
> compute nodes, rather than centrally on the control node; each device is
> labelled with a group name, and the group names are returned to the
> scheduler to form the pools of devices for allocation
> - the compute service can keep tally of allocations (and free resources)
> when it's running, and recalculates allocations when it restarts (with help
> from the compute driver, which can tell what's allocated out to running
> instances); it has none of its own persistent storage and doesn't rely on
> central storage either
> - the PCI device counts per group name (not lists) are returned to the
> scheduler node, which sums them and keeps them handy for scheduling
> - the extra_specs in the flavor is used for scheduling (not the --nic
> parameters, and you'll see the detail in the description there why)
>
> Since I needed to choose something as a base assumption, I've chosen
> this.  If I get a moment tomorrow I'll write that up as well (and similarly
> mark it with a colour, because the top half of the document contains
> several conflicting ideas and I'd like to make it clear that I intend the
> proposed bit to be self-consistent).
>
> I know there's discussion of an API to program this, but I'm going to
> ignore that for the purposes of this proposal.  Firstly, I don't think this
> is something you can change unless you're an admin, and an admin with
> intimate knowledge of the hardware at that, so I don't see why you would
> want a central API for making it 'simpler'.  Secondly, I don't think this
> is information that needs to be stored centrally at all - the distributed
> nature of having the compute nodes report what they have just seems more
> 'right' to me.  And things like connection information or connected
> resources that might need to be specified per device again feel like
> something best done where the hardware lives, not in a central database.
> And finally, you can still change what devices you offer: log into a
> compute node, change the config, restart the compute service, done.
> Difference is I don't have to call an API every time I add or remove a
> compute node.
>
>
>
> https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs/edit
> --
> Ian.
>
>
> On 11 December 2013 15:18, Robert Li (baoli) <baoli at cisco.com> wrote:
>
>>  Hi Yongli,
>>
>>  Thank you very much for sharing the Wiki with us on Monday so that we
>> have a better understanding on your ideas and thoughts. Please see embedded
>> comments.
>>
>>  --Robert
>>
>>   On 12/10/13 8:35 PM, "yongli he" <yongli.he at intel.com> wrote:
>>
>>   On 2013年12月10日 22:41, Sandhya Dasu (sadasu) wrote:
>>
>> Hi,
>>    I am trying to resurrect this email thread since discussions have
>> split between several threads and is becoming hard to keep track.
>>
>>  An update:
>>
>>  New PCI Passthrough meeting time: Tuesdays UTC 1400.
>>
>>  New PCI flavor proposal from Nova:
>>
>> https://wiki.openstack.org/wiki/PCI_configration_Database_and_API#Take_advantage_of_host_aggregate_.28T.B.D.29
>>
>> Hi, all
>>   sorry for miss the meeting, i was seeking John at that time. from the
>> log i saw some concern about new design,  i list them there and try to
>> clarify it per my opinion:
>>
>> 1. configuration going to deprecated:   this might impact SRIOV.  if
>> possible, please list what kind of impact make to you.
>>
>>
>>  Regarding the nova API pci-flavor-update, we had a face-to-face
>> discussion over use of a nova API to provision/define/configure PCI
>> passthrough list during the ice-house summit. I kind of like the idea
>> initially. As you can see from the meeting log, however, I later thought
>> that in a distributed system, using a centralized API to define resources
>> per compute node, which could come and go any time, doesn't seem to provide
>> any significant benefit. This is the reason that I didn't mention it in our
>> google doc
>> https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs/edit#
>>
>>  If you agree that pci-flavor and pci-group is kind of the same thing,
>> then we agree with you that the pci-flavor-create API is needed. Since
>> pci-flavor or pci-group is global, then such an API can be used for
>> resource registration/validation on nova server. In addition, it can be
>> used to facilitate the display of PCI devices per node, per group, or in
>> the entire cloud, etc.
>>
>>
>>
>>
>> 2. <baoli>So the API seems to be combining the whitelist + pci-group
>>     yeah, it's actually almost same thing, 'flavor' 'pci-group' or
>> 'group'. the real different is this flavor going to deprecated the alias,
>> and combine tight to aggregate or flavor.
>>
>>
>>  Well, with pci-group, we recommended to deprecate the PCI alias because
>> we think it is redundant.
>>
>>  We think that specification of PCI requirement in the flavor's extra
>> spec is still needed as it's a generic means to allocate PCI devices. In
>> addition, it can be used as properties in the host aggregate as well.
>>
>>
>>
>> 3. feature:
>>    this design is not to say the feature is not work, but changed.  if
>> auto discovery feature is possible, we got 'feature' form the device, then
>> use the feature to define the pci-flavor.  it's also possible create
>> default pci-flavor for this. so the feature concept will be impact, my
>> feeling, we should given a separated bp for feature, and not in this round
>> change, so here we only thing is keep the feature is possible.
>>
>>
>>  I think that it's ok to have separate BPs. But we think that auto
>> discovery is an essential part of the design, and therefore it should be
>> implemented with more helping hands.
>>
>>
>>
>> 4. address regular expression: i'm fine with the wild-match style.
>>
>>
>>  Sounds good. One side node is that I noticed that the driver for intel
>> 82576 cards has a strange slot assignment scheme. So the final definition
>> of it may need to accommodate that as well.
>>
>>
>>
>> 5. flavor style for sriov: i just list the flavor style in the design but
>> for the style
>>               --nic
>>                    --pci-flavor  PowerfullNIC:1
>>    still possible to work, so what's the real impact to sriov from the
>> flavor design?
>>
>>
>>  As you can see from the log, Irena has some strong opinions on this,
>> and I tend to agree with her. The problem we need to solve is this: we need
>> a means to associate a nic (or port) with a PCI device that is allocated
>> out of a PCI flavor or a PCI group. We think that we presented a complete
>> solution in our google doc.
>>
>>
>>  At this point, I really believe that we should combine our efforts and
>> ideas. As far as how many BPs are needed, it should be a trivial matter
>> after we have agreed on a complete solution.
>>
>>
>>
>> Yongli He
>>
>>
>>
>>  Thanks,
>> Sandhya
>>
>>
>>   From: Sandhya Dasu <sadasu at cisco.com>
>> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
>> <openstack-dev at lists.openstack.org>
>> Date: Thursday, November 7, 2013 9:44 PM
>> To: "OpenStack Development Mailing List (not for usage questions)" <
>> openstack-dev at lists.openstack.org>, "Jiang, Yunhong" <
>> yunhong.jiang at intel.com>, "Robert Li (baoli)" <baoli at cisco.com>, Irena
>> Berezovsky <irenab at mellanox.com>, "prashant.upadhyaya at aricent.com" <
>> prashant.upadhyaya at aricent.com>, "chris.friesen at windriver.com" <
>> chris.friesen at windriver.com>, "He, Yongli" <yongli.he at intel.com>, Itzik
>> Brown <ItzikB at mellanox.com>
>> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>   Hi,
>>      The discussions during the summit were very productive. Now, we are
>> ready to setup our IRC meeting.
>>
>>  Here are some slots that look like they might work for us.
>>
>>  1. Wed 2 – 3 pm UTC.
>> 2. Thursday 12 – 1 pm UTC.
>> 3. Thursday 7 – 8pm UTC.
>>
>>  Please vote.
>>
>>  Thanks,
>> Sandhya
>>
>>   From: Sandhya Dasu <sadasu at cisco.com>
>> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
>> <openstack-dev at lists.openstack.org>
>> Date: Tuesday, November 5, 2013 12:03 PM
>> To: "OpenStack Development Mailing List (not for usage questions)" <
>> openstack-dev at lists.openstack.org>, "Jiang, Yunhong" <
>> yunhong.jiang at intel.com>, "Robert Li (baoli)" <baoli at cisco.com>, Irena
>> Berezovsky <irenab at mellanox.com>, "prashant.upadhyaya at aricent.com" <
>> prashant.upadhyaya at aricent.com>, "chris.friesen at windriver.com" <
>> chris.friesen at windriver.com>, "He, Yongli" <yongli.he at intel.com>, Itzik
>> Brown <ItzikB at mellanox.com>
>> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>   Just to clarify, the discussion is planned for 10 AM Wednesday morning
>> at the developer's lounge.
>>
>>  Thanks,
>> Sandhya
>>
>>   From: Sandhya Dasu <sadasu at cisco.com>
>> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
>> <openstack-dev at lists.openstack.org>
>> Date: Tuesday, November 5, 2013 11:38 AM
>> To: "OpenStack Development Mailing List (not for usage questions)" <
>> openstack-dev at lists.openstack.org>, "Jiang, Yunhong" <
>> yunhong.jiang at intel.com>, "Robert Li (baoli)" <baoli at cisco.com>, Irena
>> Berezovsky <irenab at mellanox.com>, "prashant.upadhyaya at aricent.com" <
>> prashant.upadhyaya at aricent.com>, "chris.friesen at windriver.com" <
>> chris.friesen at windriver.com>, "He, Yongli" <yongli.he at intel.com>, Itzik
>> Brown <ItzikB at mellanox.com>
>> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>   Hi,
>>     We are planning to have a discussion at the developer's lounge
>> tomorrow morning at 10:00 am. Please feel free to drop by if you are
>> interested.
>>
>>  Thanks,
>> Sandhya
>>
>>  From: <Jiang>, Yunhong <yunhong.jiang at intel.com>
>>    Date: Thursday, October 31, 2013 6:21 PM
>> To: "Robert Li (baoli)" <baoli at cisco.com>, Irena Berezovsky <
>> irenab at mellanox.com>, "prashant.upadhyaya at aricent.com" <
>> prashant.upadhyaya at aricent.com>, "chris.friesen at windriver.com" <
>> chris.friesen at windriver.com>, "He, Yongli" <yongli.he at intel.com>, Itzik
>> Brown <ItzikB at mellanox.com>
>> Cc: OpenStack Development Mailing List <openstack-dev at lists.openstack.org>,
>> "Brian Bowen (brbowen)" <brbowen at cisco.com>, "Kyle Mestery (kmestery)" <
>> kmestery at cisco.com>, Sandhya Dasu <sadasu at cisco.com>
>> Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>   Robert, I think your change request for pci alias should be covered by
>> the extra infor enhancement.
>> https://blueprints.launchpad.net/nova/+spec/pci-extra-info  and Yongli
>> is working on it.
>>
>>
>>
>> I’m not sure how the port profile is passed to the connected switch, is
>> it a Cisco VMEFX specific method or libvirt method? Sorry I’m not well on
>> network side.
>>
>>
>>
>> --jyh
>>
>>
>>
>> *From:* Robert Li (baoli) [mailto:baoli at cisco.com <baoli at cisco.com>]
>> *Sent:* Wednesday, October 30, 2013 10:13 AM
>> *To:* Irena Berezovsky; Jiang, Yunhong; prashant.upadhyaya at aricent.com;
>> chris.friesen at windriver.com; He, Yongli; Itzik Brown
>> *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
>> Mestery (kmestery); Sandhya Dasu (sadasu)
>> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>
>>
>> Hi,
>>
>>
>>
>> Regarding physical network mapping,  This is what I thought.
>>
>>
>>
>> consider the following scenarios:
>>
>>    1. a compute node with SRIOV only interfaces attached to a physical
>> network. the node is connected to one upstream switch
>>
>>    2. a compute node with both SRIOV interfaces and non-SRIOV interfaces
>> attached to a physical network. the node is connected to one upstream switch
>>
>>    3. in addition to case 1 &2, a compute node may have multiple vNICs
>> that are connected to different upstream switches.
>>
>>
>>
>> CASE 1:
>>
>>  -- the mapping from a virtual network (in terms of neutron) to a
>> physical network is actually done by binding a port profile to a neutron
>> port. With cisco's VM-FEX, a port profile is associated with one or
>> multiple vlans. Once the neutron port is bound with this port-profile in
>> the upstream switch, it's effectively plugged into the physical network.
>>
>>  -- since the compute node is connected to one upstream switch, the
>> existing nova PCI alias will be sufficient. For example, one can boot a
>> Nova instance that is attached to a SRIOV port with the following command:
>>
>>           nova boot —flavor m1.large —image <image-id> --nic
>> net-id=<net>,pci-alias=<alias>,sriov=<direct|macvtap>,port-profile=<profile>
>>
>>     the net-id will be useful in terms of allocating IP address, enable
>> dhcp, etc that is associated with the network.
>>
>> -- the pci-alias specified in the nova boot command is used to create a
>> PCI request for scheduling purpose. a PCI device is bound to a neutron port
>> during the instance build time in the case of nova boot. Before invoking
>> the neutron API to create a port, an allocated PCI device out of a PCI
>> alias will be located from the PCI device list object. This device info
>> among other information will be sent to neutron to create the port.
>>
>>
>>
>> CASE 2:
>>
>> -- Assume that OVS is used for the non-SRIOV interfaces. An example of
>> configuration with ovs plugin would look like:
>>
>>             bridge_mappings = physnet1:br-vmfex
>>
>>             network_vlan_ranges = physnet1:15:17
>>
>>             tenant_network_type = vlan
>>
>>     When a neutron network is created, a vlan is either allocated or
>> specified in the neutron net-create command. Attaching a physical interface
>> to the bridge (in the above example br-vmfex) is an administrative task.
>>
>> -- to create a Nova instance with non-SRIOV port:
>>
>>            nova boot —flavor m1.large —image <image-id> --nic net-id=<net>
>>
>> -- to create a Nova instance with SRIOV port:
>>
>>            nova boot —flavor m1.large —image <image-id> --nic
>> net-id=<net>,pci-alias=<alias>,sriov=<direct|macvtap>,port-profile=<profile>
>>
>>     it's essentially the same as in the first case. But since the net-id
>> is already associated with a vlan, the vlan associated with the
>> port-profile must be identical to that vlan. This has to be enforced by
>> neutron.
>>
>>     again, since the node is connected to one upstream switch, the
>> existing nova PCI alias should be sufficient.
>>
>>
>>
>> CASE 3:
>>
>> -- A compute node might be connected to multiple upstream switches, with
>> each being a separate network. This means SRIOV PFs/VFs are already
>> implicitly associated with physical networks. In the none-SRIOV case, a
>> physical interface is associated with a physical network by plugging it
>> into that network, and attaching this interface to the ovs bridge that
>> represents this physical network on the compute node. In the SRIOV case, we
>> need a way to group the SRIOV VFs that belong to the same physical
>> networks. The existing nova PCI alias is to facilitate PCI device
>> allocation by associating <product_id, vendor_id> with an alias name. This
>> will no longer be sufficient. But it can be enhanced to achieve our goal.
>> For example, the PCI device domain, bus (if their mapping to vNIC is fixed
>> across boot) may be added into the alias, and the alias name should be
>> corresponding to a list of tuples.
>>
>>
>>
>> Another consideration is that a VF or PF might be used on the host for
>> other purposes. For example, it's possible for a neutron DHCP server to be
>> bound with a VF. Therefore, there needs a method to exclude some VFs from a
>> group.  One way is to associate an exclude list with an alias.
>>
>>
>>
>> The enhanced PCI alias can be used to support features other than neutron
>> as well. Essentially, a PCI alias can be defined as a group of PCI devices
>> associated with a feature. I'd think that this should be addressed with a
>> separate blueprint.
>>
>>
>>
>> Thanks,
>>
>> Robert
>>
>>
>>
>> On 10/30/13 12:59 AM, "Irena Berezovsky" <irenab at mellanox.com> wrote:
>>
>>
>>
>>  Hi,
>>
>> Please see my answers inline
>>
>>
>>
>> *From:* Jiang, Yunhong [mailto:yunhong.jiang at intel.com<yunhong.jiang at intel.com>]
>>
>> *Sent:* Tuesday, October 29, 2013 10:17 PM
>> *To:* Irena Berezovsky; Robert Li (baoli); prashant.upadhyaya at aricent.com;
>> chris.friesen at windriver.com; He, Yongli; Itzik Brown
>> *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
>> Mestery (kmestery); Sandhya Dasu (sadasu)
>> *Subject:* RE: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>
>>
>> Your explanation of the virtual network and physical network is quite
>> clear and should work well. We need change nova code to achieve it,
>> including get the physical network for the virtual network, passing the
>> physical network requirement to the filter properties etc.
>>
>> *[IrenaB] * The physical network is already available to nova at
>> networking/nova/api at as virtual network attribute, it then passed to the
>> VIF driver. We will push soon the fix to:
>> https://bugs.launchpad.net/nova/+bug/1239606 ; which will provide
>> general support for getting this information.
>>
>>
>>
>> For your port method, so you mean we are sure to passing network id to
>> ‘nova boot’ and nova will create the port during VM boot, am I right?
>>  Also, how can nova knows that it need allocate the PCI device for the
>> port? I’d suppose that in SR-IOV NIC environment, user don’t need specify
>> the PCI requirement. Instead, the PCI requirement should come from the
>> network configuration and image property. Or you think user still need
>> passing flavor with pci request?
>>
>> *[IrenaB] There are two way to apply port method. One is to pass network
>> id on nova boot and use default type as chosen in the neutron config file
>> for vnic type. Other way is to define port with required vnic type and
>> other properties if applicable, and run ‘nova boot’ with port id argument.
>> Going forward with nova support for PCI devices awareness, we do need a way
>> impact scheduler choice to land VM on suitable Host with available PC
>> device that has the required connectivity.*
>>
>>
>>
>> --jyh
>>
>>
>>
>>
>>
>> *From:* Irena Berezovsky [mailto:irenab at mellanox.com<irenab at mellanox.com>]
>>
>> *Sent:* Tuesday, October 29, 2013 3:17 AM
>> *To:* Jiang, Yunhong; Robert Li (baoli); prashant.upadhyaya at aricent.com;
>> chris.friesen at windriver.com; He, Yongli; Itzik Brown
>> *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
>> Mestery (kmestery); Sandhya Dasu (sadasu)
>> *Subject:* RE: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>
>>
>> Hi Jiang, Robert,
>>
>> IRC meeting option works for me.
>>
>> If I understand your question below, you are looking for a way to tie up
>> between requested virtual network(s) and requested PCI device(s). The way
>> we did it in our solution  is to map a provider:physical_network to an
>> interface that represents the Physical Function. Every virtual network is
>> bound to the provider:physical_network, so the PCI device should be
>> allocated based on this mapping.  We can  map a PCI alias to the
>> provider:physical_network.
>>
>>
>>
>> Another topic to discuss is where the mapping between neutron port and
>> PCI device should be managed. One way to solve it, is to propagate the
>> allocated PCI device details to neutron on port creation.
>>
>> In case  there is no qbg/qbh support, VF networking configuration should
>> be applied locally on the Host.
>>
>> The question is when and how to apply networking configuration on the PCI
>> device?
>>
>> We see the following options:
>>
>> ‧         it can be done on port creation.
>>
>> ‧         It can be done when nova VIF driver is called for vNIC
>> plugging. This will require to  have all networking configuration available
>> to the VIF driver or send request to the neutron server to obtain it.
>>
>> ‧         It can be done by  having a dedicated L2 neutron agent on each
>> Host that scans for allocated PCI devices  and then retrieves networking
>> configuration from the server and configures the device. The agent will be
>> also responsible for managing update requests coming from the neutron
>> server.
>>
>>
>>
>> For macvtap vNIC type assignment, the networking configuration can be
>> applied by a dedicated L2 neutron agent.
>>
>>
>>
>> BR,
>>
>> Irena
>>
>>
>>
>> *From:* Jiang, Yunhong [mailto:yunhong.jiang at intel.com<yunhong.jiang at intel.com>]
>>
>> *Sent:* Tuesday, October 29, 2013 9:04 AM
>>
>>
>> *To:* Robert Li (baoli); Irena Berezovsky; prashant.upadhyaya at aricent.com;
>> chris.friesen at windriver.com; He, Yongli; Itzik Brown
>> *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
>> Mestery (kmestery); Sandhya Dasu (sadasu)
>> *Subject:* RE: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>
>>
>> Robert, is it possible to have a IRC meeting? I’d prefer to IRC meeting
>> because it’s more openstack style and also can keep the minutes clearly.
>>
>>
>>
>> To your flow, can you give more detailed example. For example, I can
>> consider user specify the instance with –nic option specify a network id,
>> and then how nova device the requirement to the PCI device? I assume the
>> network id should define the switches that the device can connect to , but
>> how is that information translated to the PCI property requirement? Will
>> this translation happen before the nova scheduler make host decision?
>>
>>
>>
>> Thanks
>>
>> --jyh
>>
>>
>>
>> *From:* Robert Li (baoli) [mailto:baoli at cisco.com <baoli at cisco.com>]
>> *Sent:* Monday, October 28, 2013 12:22 PM
>> *To:* Irena Berezovsky; prashant.upadhyaya at aricent.com; Jiang, Yunhong;
>> chris.friesen at windriver.com; He, Yongli; Itzik Brown
>>
>> *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
>> Mestery (kmestery); Sandhya Dasu (sadasu)
>> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>
>>
>> Hi Irena,
>>
>>
>>
>> Thank you very much for your comments. See inline.
>>
>>
>>
>> --Robert
>>
>>
>>
>> On 10/27/13 3:48 AM, "Irena Berezovsky" <irenab at mellanox.com> wrote:
>>
>>
>>
>>  Hi Robert,
>>
>> Thank you very much for sharing the information regarding your efforts.
>> Can you please share your idea of the end to end flow? How do you suggest
>>  to bind Nova and Neutron?
>>
>>
>>
>> The end to end flow is actually encompassed in the blueprints in a
>> nutshell. I will reiterate it in below. The binding between Nova and
>> Neutron occurs with the neutron v2 API that nova invokes in order to
>> provision the neutron services. The vif driver is responsible for plugging
>> in an instance onto the networking setup that neutron has created on the
>> host.
>>
>>
>>
>> Normally, one will invoke "nova boot" api with the —nic options to
>> specify the nic with which the instance will be connected to the network.
>> It currently allows net-id, fixed ip and/or port-id to be specified for the
>> option. However, it doesn't allow one to specify special networking
>> requirements for the instance. Thanks to the nova pci-passthrough work, one
>> can specify PCI passthrough device(s) in the nova flavor. But it doesn't
>> provide means to tie up these PCI devices in the case of ethernet adpators
>> with networking services. Therefore the idea is actually simple as
>> indicated by the blueprint titles, to provide means to tie up SRIOV devices
>> with neutron services. A work flow would roughly look like this for 'nova
>> boot':
>>
>>
>>
>>       -- Specifies networking requirements in the —nic option.
>> Specifically for SRIOV, allow the following to be specified in addition to
>> the existing required information:
>>
>>                . PCI alias
>>
>>                . direct pci-passthrough/macvtap
>>
>>                . port profileid that is compliant with 802.1Qbh
>>
>>
>>
>>         The above information is optional. In the absence of them, the
>> existing behavior remains.
>>
>>
>>
>>      -- if special networking requirements exist, Nova api creates PCI
>> requests in the nova instance type for scheduling purpose
>>
>>
>>
>>      -- Nova scheduler schedules the instance based on the requested
>> flavor plus the PCI requests that are created for networking.
>>
>>
>>
>>      -- Nova compute invokes neutron services with PCI passthrough
>> information if any
>>
>>
>>
>>      --  Neutron performs its normal operations based on the request,
>> such as allocating a port, assigning ip addresses, etc. Specific to SRIOV,
>> it should validate the information such as profileid, and stores them in
>> its db. It's also possible to associate a port profileid with a neutron
>> network so that port profileid becomes optional in the —nic option. Neutron
>> returns  nova the port information, especially for PCI passthrough related
>> information in the port binding object. Currently, the port binding object
>> contains the following information:
>>
>>           binding:vif_type
>>
>>           binding:host_id
>>
>>           binding:profile
>>
>>           binding:capabilities
>>
>>
>>
>>     -- nova constructs the domain xml and plug in the instance by calling
>> the vif driver. The vif driver can build up the interface xml based on the
>> port binding information.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> The blueprints you registered make sense. On Nova side, there is a need
>> to bind between requested virtual network and PCI device/interface to be
>> allocated as vNIC.
>>
>> On the Neutron side, there is a need to  support networking configuration
>> of the vNIC. Neutron should be able to identify the PCI device/macvtap
>> interface in order to apply configuration. I think it makes sense to
>> provide neutron integration via dedicated Modular Layer 2 Mechanism Driver
>> to allow PCI pass-through vNIC support along with other networking
>> technologies.
>>
>>
>>
>> I haven't sorted through this yet. A neutron port could be associated
>> with a PCI device or not, which is a common feature, IMHO. However, a ML2
>> driver may be needed specific to a particular SRIOV technology.
>>
>>
>>
>>
>>
>> During the Havana Release, we introduced Mellanox Neutron plugin that
>> enables networking via SRIOV pass-through devices or macvtap interfaces.
>>
>> We want to integrate our solution with PCI pass-through Nova support.  I
>> will be glad to share more details if you are interested.
>>
>>
>>
>>
>>
>> Good to know that you already have a SRIOV implementation. I found out
>> some information online about the mlnx plugin, but need more time to get to
>> know it better. And certainly I'm interested in knowing its details.
>>
>>
>>
>>  The PCI pass-through networking support is planned to be discussed
>> during the summit: http://summit.openstack.org/cfp/details/129. I think
>> it’s worth to drill down into more detailed proposal and present it during
>> the summit, especially since it impacts both nova and neutron projects.
>>
>>
>>
>>  I agree. Maybe we can steal some time in that discussion.
>>
>>
>>
>>  Would you be interested in collaboration on this effort? Would you be
>> interested to exchange more emails or set an IRC/WebEx meeting during this
>> week before the summit?
>>
>>
>>
>> Sure. If folks want to discuss it before the summit, we can schedule a
>> webex later this week. Or otherwise, we can continue the discussion with
>> email.
>>
>>
>>
>>
>>
>>
>>
>> Regards,
>>
>> Irena
>>
>>
>>
>> *From:* Robert Li (baoli) [mailto:baoli at cisco.com <baoli at cisco.com>]
>> *Sent:* Friday, October 25, 2013 11:16 PM
>> *To:* prashant.upadhyaya at aricent.com; Irena Berezovsky;
>> yunhong.jiang at intel.com; chris.friesen at windriver.com; yongli.he at intel.com
>> *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
>> Mestery (kmestery); Sandhya Dasu (sadasu)
>> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>
>>
>> Hi Irena,
>>
>>
>>
>> This is Robert Li from Cisco Systems. Recently, I was tasked to
>> investigate such support for Cisco's systems that support VM-FEX, which is
>> a SRIOV technology supporting 802-1Qbh. I was able to bring up nova
>> instances with SRIOV interfaces, and establish networking in between the
>> instances that employes the SRIOV interfaces. Certainly, this was
>> accomplished with hacking and some manual intervention. Based on this
>> experience and my study with the two existing nova pci-passthrough
>> blueprints that have been implemented and committed into Havana (
>> https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
>> https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt),  I
>> registered a couple of blueprints (one on Nova side, the other on the
>> Neutron side):
>>
>>
>>
>> https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
>>
>> https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov
>>
>>
>>
>> in order to address SRIOV support in openstack.
>>
>>
>>
>> Please take a look at them and see if they make sense, and let me know
>> any comments and questions. We can also discuss this in the summit, I
>> suppose.
>>
>>
>>
>> I noticed that there is another thread on this topic, so copy those folks
>>  from that thread as well.
>>
>>
>>
>> thanks,
>>
>> Robert
>>
>>
>>
>> On 10/16/13 4:32 PM, "Irena Berezovsky" <irenab at mellanox.com> wrote:
>>
>>
>>
>>  Hi,
>>
>> As one of the next steps for PCI pass-through I would like to discuss is
>> the support for PCI pass-through vNIC.
>>
>> While nova takes care of PCI pass-through device resources  management
>> and VIF settings, neutron should manage their networking configuration.
>>
>> I would like to register asummit proposal to discuss the support for PCI
>> pass-through networking.
>>
>> I am not sure what would be the right topic to discuss the PCI
>> pass-through networking, since it involve both nova and neutron.
>>
>> There is already a session registered by Yongli on nova topic to discuss
>> the PCI pass-through next steps.
>>
>> I think PCI pass-through networking is quite a big topic and it worth to
>> have a separate discussion.
>>
>> Is there any other people who are interested to discuss it and share
>> their thoughts and experience?
>>
>>
>>
>> Regards,
>>
>> Irena
>>
>>
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131212/34109ac7/attachment-0001.html>


More information about the OpenStack-dev mailing list