[openstack-dev] [nova] [neutron] Todays' meeting log: PCI pass-through network support
yongli he
yongli.he at intel.com
Tue Dec 24 02:08:15 UTC 2013
On 2013年12月24日 04:50, Robert Li (baoli) wrote:
> Hi Irena,
>
> I agree with you on the following copied from another thread:
>
> I would like to suggest to focus the next PCI-pass through IRC meeting on:
>
> 1.Closing the administration and tenant that powers the VM use cases.
>
> 2.Decouple the nova and neutron parts to start focusing on the neutron
> related details.
>
>
> But so far, we haven't been able to reach agreement on the first part.
> I saw discussions on "nic-flavor" from John. I'd like to know more
> details about it. What exactly it is, how it is defined, etc.
>
> Let's continue our discussion tomorrow. Here is the agenda that I'd
> like to discuss. Folks, please add yours if you have specific things
> to discuss.
>
> -- Auto Discovery
>
> It seems to be in agreement that this is something we should have, but
> in disagreement when it's needed.
>
>
> different proposals:
>
> * it can be done later
>
> * the class of a PCI device can determine, in particular, if it's a
> networking device. With that, SRIOV may be achieved in a cloud with
> minimum configuration. Therefore, it should be done now.
>
auto discovery the class of pci device, there is many decision need to
be addressed before it start:
a) is reading the /sys works for all? some VF does not in the linux
device tree, and it's driver's decision.
b) libvirt itself or libvirt driver? both might be need long time.
>
>
> -- PCI group (or PCI flavor) and PCI passthrough list (whitelist).
>
>
> * We should finalize the terminology: pci-group or pci-flavor. In the
> rest of the email, I use the term pci-group
>
i'm fine with any one. John's group or your's pci-group. it's same thing.
>
> there are a couple of ways to provisionthem
>
> * by configuration/provisioning on the compute nodes
>
> * by nova APIs. The API can define them per compute node.
>
>
> With regard to configuration
>
> * different ways/formats are suggested
>
>
> With regard to nova PCI group APIs:
>
> different proposals:
>
> * no API at all
>
> * pci-group-create is a good thing to have, but pci-group-update is not
>
> * have both pci-group-create & pci-group-update, and get rid of the
> configuration method. pci-group-update can define PCI devices per
> compute node
>
> * it might be possible to use both the configuration method, and the
> nova APIs with pci-group-update not defining per compute node devices.
>
>
> -- nova boot
>
> different proposals:
>
> * exclusive use of the server flavor to specify PCI device
> requirements, adding new arguments to neutron port-create for PCI
> related information
>
> * adding new optional arguments in the —nic option for SRIOV, in the
> same time preservingthe server flavor approach for genericPCI
> passthrough. In addition, neutron port-create should be enhancedto be
> able to specify PCI related information as well.
>
> * Also there are different opinions on what optional arguments should
> be added in the —nic option.
>
>
> On the neutron front, yes, we haven't be able to discuss all the
> details yet. We can start dedicating a portion of the meeting time for it.
>
> Time seems to be running out for Icehouse. We need to come to
> agreement ASAP. I will be out from wednesday until after new year. I'm
> thinking that to move it forward after the new year, we may need to
> have the IRC meeting in a daily basis until we reachagreement. This
> should be one of our new year's resolutions?
>
> Thanks,
> Robert
>
>
>
> On 12/23/13 8:34 AM, "Irena Berezovsky" <irenab at mellanox.com
> <mailto:irenab at mellanox.com>> wrote:
>
> Hi,
>
> Is there ‘PCI pass-through network’ IRC meeting tomorrow?
>
> BR,
>
> Irena
>
> *From:*Robert Li (baoli) [mailto:baoli at cisco.com]
> *Sent:* Tuesday, December 17, 2013 5:32 PM
> *To:* Sandhya Dasu (sadasu); OpenStack Development Mailing List
> (not for usage questions); Jiang, Yunhong; Irena Berezovsky;
> prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>;
> chris.friesen at windriver.com <mailto:chris.friesen at windriver.com>;
> Itzik Brown; john at johngarbutt.com <mailto:john at johngarbutt.com>
> *Subject:* Re: [openstack-dev] [nova] [neutron] Todays' meeting
> log: PCI pass-through network support
>
> Sorry guys, I didn't #startmeeting before the meeting. But here is
> the log from today's meeting. Updated the subject a bit.
>
> <irenab> baoli: hi
>
> [08:57] <baoli> Hi Irena
>
> [08:57] ==tedross [tedross at nat/redhat/x-culmgvjwkhbnuyww] has
> joined #openstack-meeting-alt
>
> [08:58] <irenab> baoli: unfortunately I cannot participate
> actively today, will try to follow the log and email later to day
>
> [08:59] <baoli> ok
>
> [09:00] ==natishalom [~qicruser at 2.55.138.181
> <mailto:%7Eqicruser at 2.55.138.181>] has joined #openstack-meeting-alt
>
> [09:00] ==HenryG [~HenryG at nat/cisco/x-aesrcihoscocixap] has joined
> #openstack-meeting-alt
>
> [09:00] ==tedross [tedross at nat/redhat/x-culmgvjwkhbnuyww] has left
> #openstack-meeting-alt []
>
> [09:01] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 264 seconds]
>
> [09:01] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:03] <baoli> Hi, is Yongli there?
>
> [09:04] ==yjiang51 [yjiang5 at nat/intel/x-uobnfwflcweybytj] has
> joined #openstack-meeting-alt
>
> [09:04] ==jdob [~jdob at c-50-166-75-72.hsd1.nj.comcast.net
> <mailto:%7Ejdob at c-50-166-75-72.hsd1.nj.comcast.net>] has quit
> [Quit: Leaving]
>
> [09:04] ==jdob_ [~jdob at c-50-166-75-72.hsd1.nj.comcast.net
> <mailto:%7Ejdob at c-50-166-75-72.hsd1.nj.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:04] <yjiang51> baoli: hi
>
> [09:05] ==hajay__
> [~hajay at 99-46-140-220.lightspeed.sntcca.sbcglobal.net
> <mailto:%7Ehajay at 99-46-140-220.lightspeed.sntcca.sbcglobal.net>]
> has joined #openstack-meeting-alt
>
> [09:05] <baoli> yjang: hi
>
> [09:05] <yjiang51> baoli: do we have the meeting?
>
> [09:05] <baoli> Yes, it's on. Hopefully, Yongli will join
>
> [09:06] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 260 seconds]
>
> [09:07] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:07] <yjiang51> baoli: got it and thanks
>
> [09:07] ==natishalom [~qicruser at 2.55.138.181
> <mailto:%7Eqicruser at 2.55.138.181>] has quit [Ping timeout: 252
> seconds]
>
> [09:07] ==heyongli [~yhe at 221.216.132.130
> <mailto:%7Eyhe at 221.216.132.130>] has joined #openstack-meeting-alt
>
> [09:07] <baoli> yhe, HI
>
> [09:08] <heyongli> hello, every one
>
> [09:08] <yjiang51> heyongli: hi
>
> [09:08] <baoli> Hi everyone, let's start
>
> [09:08] ==hajay_ [~hajay at 66.129.239.12
> <mailto:%7Ehajay at 66.129.239.12>] has quit [Ping timeout: 252 seconds]
>
> [09:08] <baoli> Yongli has summarized his wiki with his email
>
> [09:09] <heyongli> i just arrived home from hospital, sorry late
>
> [09:09] ==hajay__
> [~hajay at 99-46-140-220.lightspeed.sntcca.sbcglobal.net
> <mailto:%7Ehajay at 99-46-140-220.lightspeed.sntcca.sbcglobal.net>]
> has quit [Ping timeout: 264 seconds]
>
> [09:10] <baoli> yhe, np. Hopefully, you are well
>
> [09:10] ==lsmola_ [~Ladas at ip-94-112-129-242.net.upcbroadband.cz
> <mailto:%7ELadas at ip-94-112-129-242.net.upcbroadband.cz>] has
> joined #openstack-meeting-alt
>
> [09:10] <heyongli> my, son. so i think you might worry about he
> use case right?
>
> [09:10] <baoli> Can we start with pci-flaovr/pci-group definition?
> Do we agree that they are the same?
>
> [09:11] <heyongli> in my brain, it's a filter with name, but in
> the flat dict structure, no sub pci-filter
>
> [09:12] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 264 seconds]
>
> [09:12] <baoli> Well, we want to agree conceptually.
>
> [09:12] ==BrianB_ [4066f90e at gateway/web/freenode/ip.64.102.249.14
> <mailto:4066f90e at gateway/web/freenode/ip.64.102.249.14>] has
> joined #openstack-meeting-alt
>
> [09:13] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:13] <heyongli> cause for me it's just a the white list with
> name, so conceptually it's simple, can be describe clear in this way
>
> [09:14] <baoli> Ok. So, they all define a group of devices with
> similar properties.
>
> [09:15] <heyongli> agree
>
> [09:15] <baoli> great
>
> [09:16] <heyongli> any other concern for the flavor?
>
> [09:16] <baoli> Now, it seems to me that pci-flavor can be defined
> by both nova API and by means of configuration
>
> [09:16] <baoli> from your email
>
> [09:16] <heyongli> config is going to fade out
>
> [09:17] <heyongli> for config fade out, any concern?
>
> [09:17] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 245 seconds]
>
> [09:17] <baoli> in your email, what is "admin config sriov"?
>
> [09:17] <heyongli> just mean this step is done by admin
>
> [09:17] ==abramley [~abramley at 69.38.149.98
> <mailto:%7Eabramley at 69.38.149.98>] has joined #openstack-meeting-alt
>
> [09:18] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:18] <heyongli> John want the picture for user and for admin is
> clearly defined
>
> [09:18] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Remote host closed the connection]
>
> [09:18] ==jdob_ [~jdob at c-50-166-75-72.hsd1.nj.comcast.net
> <mailto:%7Ejdob at c-50-166-75-72.hsd1.nj.comcast.net>] has quit
> [Quit: Leaving]
>
> [09:18] ==jdob [~jdob at c-50-166-75-72.hsd1.nj.comcast.net
> <mailto:%7Ejdob at c-50-166-75-72.hsd1.nj.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:19] <baoli> We have some concerns over phasing out the
> configuration
>
> [09:19] <baoli> Did you check the log from last meeting?
>
> [09:19] <heyongli> i do, but not see the strong reason
>
> [09:20] <baoli> How is it in your mind the nova pci-flavor-update
> is going to be used?
>
> [09:20] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:20] <heyongli> just the the whole content for the filter
>
> [09:21] <baoli> Well, I'd like to know who is going to invoke it
> and when
>
> [09:21] <heyongli> toltaly replace or set the new defination for
> the flavor
>
> [09:21] ==ijw [~ijw at nat/cisco/x-urnealzfvlrtqrbx] has joined
> #openstack-meeting-alt
>
> [09:21] <heyongli> define this , then the device is pass the
> whitelist and got group into a flavor
>
> [09:22] <ijw> Soirry I'm late
>
> [09:22] ==banix [banix at nat/ibm/x-bhsigoejtesvdhwi] has joined
> #openstack-meeting-alt
>
> [09:22] <baoli> ijw: np
>
> [09:22] ==eankutse [~Adium at 50.56.230.39
> <mailto:%7EAdium at 50.56.230.39>] has joined #openstack-meeting-alt
>
> [09:22] ==eankutse1 [~Adium at 50.57.17.244
> <mailto:%7EAdium at 50.57.17.244>] has joined #openstack-meeting-alt
>
> [09:22] ==eankutse [~Adium at 50.56.230.39
> <mailto:%7EAdium at 50.56.230.39>] has quit [Read error: No buffer
> space available]
>
> [09:23] <heyongli> this is just the whitelist's DB version, via API
>
> [09:24] <ijw> Apologies for jumping in, but did we do the
> API/no-API discussion yet?
>
> [09:24] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 245 seconds]
>
> [09:24] <heyongli> current topic
>
> [09:25] <baoli> heyongli: let's assume a new compute node is
> added, what do you do to provision it?
>
> [09:25] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:25] <heyongli> 2.1.1 admin check PCI devices present per host
>
> [09:25] <ijw> I would ask, given that Openstack's design tenets
> are all about decentralising where possible, why would you
> centralise the entirety of the PCI information?
>
> [09:26] <ijw> Have to admit I came a bit late to that document -
> because all the work was going on in the other doducment
>
> [09:26] <ijw> Which didn't mention this at all
>
> [09:26] <heyongli> this is not relevent to tenet, it's admin's work
>
> [09:27] <ijw> It's actually not the problem. It's not that it's
> not relevant to the tenant, it's why you have to actively do
> anything to add a compute node at all. In every other respect a
> compute node joins the cluster with no activity
>
> [09:27] ==yamahata__ [~yamahata at 192.55.55.39
> <mailto:%7Eyamahata at 192.55.55.39>] has quit [Ping timeout: 240
> seconds]
>
> [09:28] <ijw> So, for instance, I boot a compute node, RAM goes
> up, disk goes up, CPUs go up, but I've not had to edit a central
> table to do that, the compute node reports in and it just happens.
>
> [09:28] ==abramley [~abramley at 69.38.149.98
> <mailto:%7Eabramley at 69.38.149.98>] has quit [Quit: abramley]
>
> [09:28] <ijw> I like this - it means when I provision a cluster I
> just have to get each node to provision correctly and the cluster
> is up. Conversely when the node goes down the resources go away.
>
> [09:28] ==yamahata__ [yamahata at nat/intel/x-hvbvnjztdhymckzk] has
> joined #openstack-meeting-alt
>
> [09:28] ==esker [~esker at rrcs-67-79-207-12.sw.biz.rr.com
> <mailto:%7Eesker at rrcs-67-79-207-12.sw.biz.rr.com>] has joined
> #openstack-meeting-alt
>
> [09:29] ==denis_makogon [~dmakogon at 194.213.110.67
> <mailto:%7Edmakogon at 194.213.110.67>] has quit [Ping timeout: 240
> seconds]
>
> [09:29] <heyongli> cause pci-flavor is global, you don't need to
> config it specifically,
>
> [09:29] <ijw> So I would strongly argue that the nodes should
> decide what PCI passthrough devices they have, independently and
> without reference to central authority.
>
> [09:29] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 252 seconds]
>
> [09:30] <ijw> Yes, but that says that all my nodes are either
> identical or similar, and while that may be true it makes more
> sense to keep that configuration on and with the machine rather
> than in a central DB just in case it's not.
>
> [09:30] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:30] <heyongli> ijw: suppose you had 500 server's bring in, all
> with same configration, like same slot for a same pci device
>
> [09:31] <ijw> Yup, then I would boot them up all with the same
> config file on each, same as I install the same software on each.
> That's a devops problem and it's got plenty of solutions.
>
> [09:31] <baoli> heyongli, a pci-flaovr is a global name. But
> what's part of a pci-flaovr is a matter of the compute host that
> supports that flavor
>
> [09:31] ==julim
> [~julim at pool-173-76-179-202.bstnma.fios.verizon.net
> <mailto:%7Ejulim at pool-173-76-179-202.bstnma.fios.verizon.net>] has
> joined #openstack-meeting-alt
>
> [09:31] ==ruhe [~ruhe at 91.207.132.76
> <mailto:%7Eruhe at 91.207.132.76>] has quit [Ping timeout: 246 seconds]
>
> [09:31] <heyongli> then you got this flow to easily bring all them
> up ready for pci: export the flavor in aggreate
>
> [09:31] ==shakayumi [~shakayumi at 156.39.10.22
> <mailto:%7Eshakayumi at 156.39.10.22>] has quit [Ping timeout: 250
> seconds]
>
> [09:31] <ijw> heyongli: If I were doing this with puppet, or chef,
> or ansible, or whatever, I would work out what type of host I had
> and put a config on it to suit. This is solving a problem that
> doesn't exist.
>
> [09:32] ==jmaron
> [~jmaron at pool-173-61-178-93.cmdnnj.fios.verizon.net
> <mailto:%7Ejmaron at pool-173-61-178-93.cmdnnj.fios.verizon.net>] has
> joined #openstack-meeting-alt
>
> [09:32] <ijw> And aggregates divide machines by location,
> generally, not type.
>
> [09:32] ==yamahata [~yamahata at i193022.dynamic.ppp.asahi-net.or.jp
> <mailto:%7Eyamahata at i193022.dynamic.ppp.asahi-net.or.jp>] has quit
> [Read error: Connection timed out]
>
> [09:33] ==aignatov [~aignatov at 91.207.132.72
> <mailto:%7Eaignatov at 91.207.132.72>] has quit [Ping timeout: 245
> seconds]
>
> [09:33] <ijw> In summary, do not like. I don't understand why it's
> a good idea to use APIs to describe basic hardware details.
>
> [09:33] <baoli> yeyongli: I think that you agreed the aggregate is
> a high level construct. It has nothing to do with how a compute
> node decides what devices belong to which pci-flavor/pci-group
>
> [09:33] <heyongli> i might wrong, but aggregate bp say it's a sub
> group of hosts with same property that's why aggregate's meta data
> and scheduler do it's work
>
> [09:33] ==denis_makogon [~dmakogon at 194.213.110.67
> <mailto:%7Edmakogon at 194.213.110.67>] has joined #openstack-meeting-alt
>
> [09:33] ==markmcclain
> [~markmccla at c-98-242-72-116.hsd1.ga.comcast.net
> <mailto:%7Emarkmccla at c-98-242-72-116.hsd1.ga.comcast.net>] has
> quit [Quit: Leaving.]
>
> [09:34] ==yamahata [~yamahata at i193022.dynamic.ppp.asahi-net.or.jp
> <mailto:%7Eyamahata at i193022.dynamic.ppp.asahi-net.or.jp>] has
> joined #openstack-meeting-alt
>
> [09:34] ==irenab [c12fa5fb at gateway/web/freenode/ip.193.47.165.251
> <mailto:c12fa5fb at gateway/web/freenode/ip.193.47.165.251>] has quit
> [Ping timeout: 272 seconds]
>
> [09:34] <ijw> Aggregates are there for scheduling, though, not
> provisioning
>
> [09:34] ==natishalom [~qicruser at 62.90.11.161
> <mailto:%7Eqicruser at 62.90.11.161>] has joined #openstack-meeting-alt
>
> [09:34] ==aignatov [~aignatov at 91.207.132.76
> <mailto:%7Eaignatov at 91.207.132.76>] has joined #openstack-meeting-alt
>
> [09:34] <baoli> yeyongli: i have no problem with nova
> pci-flavor-create, but with nova pci-flavor-update
>
> [09:34] ==natishalom [~qicruser at 62.90.11.161
> <mailto:%7Eqicruser at 62.90.11.161>] has quit [Client Quit]
>
> [09:34] <baoli> so, aggregate can still work
>
> [09:34] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 248 seconds]
>
> [09:34] <ijw> I have a problem with using APIs and the database to
> do this *at all*.
>
> [09:35] <heyongli> what's that?
>
> [09:35] <ijw> That we shouldn't be storing this information
> centrally. This is exactly what per-host config files are for.
>
> [09:36] <baoli> ijw: let's focus on the API versus configuration.
> Not diverage to use of DB.
>
> [09:36] <ijw> Also, this is not something that changes on a whim,
> it changes precisely and only when the hardware in your cluster
> changes, so it seems to me that using a config file will make that
> happen per the devops comments above, and using APIs is solving a
> problem that doesn't really exist.
>
> [09:37] <heyongli> acctually i argued for the aggregate is is for
> provisioning, failed
>
> [09:37] <ijw> baoli: there's no disctinction to speak of. The APIs
> clearly change a data model that lives somewhere that is not on
> the individual compute hosts.
>
> [09:38] <ijw> So, why do we need this to be changeable by API at
> all, and why should the information be stored centrally? These are
> the two questions I want answers to for this proposal to make sense.
>
> [09:38] <heyongli> hi, ijw, if use per host setting there still
> need a central thing: the alias, but alias is fade out also
>
> [09:39] <ijw> No, you don't, you can work out
> aliases/groups/whatever by what compute hosts report. Only the
> scheduler needs to know it and it can work it out on the fly.
>
> [09:39] <heyongli> so global flavor combined the whitelist and flavor
>
> [09:39] <heyongli> if no global thing, how do you know there is
> 'sth' to be ready for use?
>
> [09:39] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:40] <ijw> That's what the scheduler does. Practically speaking
> you never know if you can schedule a machine until you schedule a
> machine.
>
> [09:40] <yjiang51> ijw: heyongli, I think we need persuade john if
> we have anything different. Is it possible to get John on this
> meeting?
>
> [09:40] <ijw> The only difference in what you're saying is that
> you couldn't validate a launch command against groups when it's
> placed, and that's certainly a weakness, but not a very big one.
>
> [09:41] <heyongli> ijw: no, you must provide you request to
> scheduele, so how do you want tell the schedule what you want?
>
> [09:41] <ijw> Which John?
>
> [09:41] <ijw> extra_specs in the flavor.
>
> [09:41] <ijw> Listing PCI aliases and counts rather than PCI flavors.
>
> [09:42] <ijw> This assumes that your aliases are named by string
> so that you can refer to them (which is an idea I largely stole
> from the way provider network work, btw)
>
> [09:43] <baoli> heyongli: I guess that we didn't do a good job in
> the google doc in describing how the pci-group works. Otherwise,
> it describes exactly why alias is not needed, and pci-group should
> work
>
> [09:43] <ijw> So, in my scheme: 1. you tell the compute host that
> PCI device x is usable by passthrough with flavor 'fred'. You
> schedule a machine requesting one of 'fred' in its flavor, and the
> scheduler finds the host. This is back to the simple mechanism we
> have now, I don't really thing it needs complicating.
>
> [09:44] <ijw> Sorry, s/flavor/group/ in the first location that
> last comment.
>
> [09:44] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 240 seconds]
>
> [09:44] ==ruhe [~ruhe at 91.207.132.72
> <mailto:%7Eruhe at 91.207.132.72>] has joined #openstack-meeting-alt
>
> [09:45] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:45] ==heyongli [~yhe at 221.216.132.130
> <mailto:%7Eyhe at 221.216.132.130>] has quit [Ping timeout: 248 seconds]
>
> [09:46] ==esker [~esker at rrcs-67-79-207-12.sw.biz.rr.com
> <mailto:%7Eesker at rrcs-67-79-207-12.sw.biz.rr.com>] has quit
> [Remote host closed the connection]
>
> [09:46] ==esker [~esker at 198.95.226.40
> <mailto:%7Eesker at 198.95.226.40>] has joined #openstack-meeting-alt
>
> [09:47] ==demorris [~daniel.mo at rrcs-67-78-97-126.sw.biz.rr.com
> <mailto:%7Edaniel.mo at rrcs-67-78-97-126.sw.biz.rr.com>] has joined
> #openstack-meeting-alt
>
> [09:47] <ijw> Bad moment time for network trouble…
>
> [09:47] <yjiang51> ijw: yes, seems he lose the connection
>
> [09:48] ==mtreinish
> [~mtreinish at pool-173-62-56-236.pghkny.fios.verizon.net
> <mailto:%7Emtreinish at pool-173-62-56-236.pghkny.fios.verizon.net>]
> has quit [Ping timeout: 272 seconds]
>
> [09:49] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 248 seconds]
>
> [09:50] <yjiang51> ijw: but I agree that if we need create pci
> flavor each time to make compute node's PCI information available
> seems not so straightforward.
>
> [09:51] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:51] ==heyongli [~yhe at 221.216.132.130
> <mailto:%7Eyhe at 221.216.132.130>] has joined #openstack-meeting-alt
>
> [09:51] ==mtreinish
> [~mtreinish at pool-173-62-56-236.pghkny.fios.verizon.net
> <mailto:%7Emtreinish at pool-173-62-56-236.pghkny.fios.verizon.net>]
> has joined #openstack-meeting-alt
>
> [09:51] <ijw> Well, turning this around the other way, if you
> described the groups of PCI devices that a compute node was
> offering in the configuration of the compute node, what's the
> problem with that?
>
> [09:52] <heyongli> ijw: np, but alias is killed during the blue
> print review
>
> [09:52] <baoli> keep in mind, this is provisioning task on the
> part of compute nodes
>
> [09:52] <heyongli> btw: i'm lost connection, so i don't you you
> see this, i just paste again:
>
> [09:53] <heyongli> <heyongli> yeah, what's in the extra_spec?
>
> [09:53] <heyongli> <heyongli> currently in the extra spec is
> alias, what would you save in there?
>
> [09:53] <heyongli> <heyongli> no matter what you save there,
> that's will be global thing or something like alias currently been
> implemented.
>
> [09:53] <heyongli> <heyongli> you can not elimation a global thing
> there, but the room for argue is where is should be define
>
> [09:53] <heyongli> <heyongli> where it is
>
> [09:53] <heyongli> <heyongli> and another topic/TODO is Nova
> community want see some code for this design for further evaluation
>
> [09:53] <heyongli> <heyongli> i'm work on it, so we can make some
> progress
>
> [09:53] <baoli> heyongli: it's <pci-flavor:no>
>
> [09:53] ==demorris [~daniel.mo at rrcs-67-78-97-126.sw.biz.rr.com
> <mailto:%7Edaniel.mo at rrcs-67-78-97-126.sw.biz.rr.com>] has quit
> [Ping timeout: 252 seconds]
>
> [09:53] <baoli> sorry <pci-flavor:#of devices>
>
> [09:54] <heyongli> baoli: i'm lost , what do you mean
>
> [09:54] <ijw> heyongli: er, since we're working on two documents I
> don't even know which document review you're talking about.
>
> [09:54] <baoli> in the nova flavor, you can do pci-flavor (or
> pci_group): 2 in the extra_specs
>
> [09:55] <heyongli> ijw: i paste the link there long time ago
>
> [09:55] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 248 seconds]
>
> [09:55] <heyongli> for review, only bp is valid... am i right?
>
> [09:55] <ijw> I think it's fairly reasonable to say that at this
> point 'pci flavor', 'alias' and 'group' are all synonyms.
> Whichever we use we're talking about a PCI device type we want to
> allocate.
>
> [09:55] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:56] <ijw> heyongli: no, not really - this isn't a formal
> process, we're trying to reach agreement here;.
>
> [09:56] <heyongli> ijw: yep, the current in tree, use synonyms:
> whitelist, alias
>
> [09:56] ==demorris [~daniel.mo at 72.32.115.230
> <mailto:%7Edaniel.mo at 72.32.115.230>] has joined #openstack-meeting-alt
>
> [09:57] ==jecarey [jecarey at nat/ibm/x-njofcfftyghvgqwd] has joined
> #openstack-meeting-alt
>
> [09:57] <ijw> What we agree we want: to be able to nominate
> devices by a fairly flexible method on a host (down to host/path
> and as widely as vendor/device) to a specific group; to schedule a
> machine with a combination of device allocations from various
> groups. Right so far?
>
> [09:57] <ijw> I think that's the core of where we agree.
>
> [09:58] ==gokrokve [~gokrokve at c-24-6-222-8.hsd1.ca.comcast.net
> <mailto:%7Egokrokve at c-24-6-222-8.hsd1.ca.comcast.net>] has joined
> #openstack-meeting-alt
>
> [09:58] <heyongli> ijw: right i think, i agree this, and part of
> this is in tree except group.
>
> [09:58] <ijw> Beyond that, there are two different proposals, one
> with an API and one which is config driven. How do we choose
> between them?
>
> [09:58] <heyongli> ijw: for me this is a trade off.
>
> [09:59] <ijw> For me, it's not - I see the API as lots more
> complex and also harder to use
>
> [09:59] <heyongli> config many many machine had scale problem
>
> [09:59] ==chandankumar [chandankum at nat/redhat/x-qhjjbtjvegvuzagq]
> has quit [Quit: Leaving]
>
> [09:59] ==amitgandhi [~amitgandh at 72.32.115.231
> <mailto:%7Eamitgandh at 72.32.115.231>] has joined #openstack-meeting-alt
>
> [10:00] <ijw> But if you're configuring many machines, then
> there's no problem, because you have a deployment system that will
> configure them identically. I do 10 node clusters automatically,
> I'm sure if I have 500 there's going to be no logging into them
> and accidentally typoing the config
>
> [10:00] <baoli> heyongli: it's not really a scale problem in terms
> of provisioning
>
> [10:00] <ijw> So that's a non-problem and I think we should remove
> that from the discussion
>
> [10:00] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has quit
> [Ping timeout: 261 seconds]
>
> [10:01] ==markmcclain
> [~markmccla at c-24-99-84-83.hsd1.ga.comcast.net
> <mailto:%7Emarkmccla at c-24-99-84-83.hsd1.ga.comcast.net>] has
> joined #openstack-meeting-alt
>
> [10:01] <ijw> (Note this is different from host aggregates - I
> might aggregate hosts by physical location of by power strip,
> things I absolutely can't determine automatically, so there's no
> parallel there)
>
> [10:01] ==gokrokve [~gokrokve at c-24-6-222-8.hsd1.ca.comcast.net
> <mailto:%7Egokrokve at c-24-6-222-8.hsd1.ca.comcast.net>] has quit
> [Remote host closed the connection]
>
> [10:01] ==gokrokve [~gokrokve at c-24-6-222-8.hsd1.ca.comcast.net
> <mailto:%7Egokrokve at c-24-6-222-8.hsd1.ca.comcast.net>] has joined
> #openstack-meeting-alt
>
> [10:02] ==SushilKM [~SushilKM at 202.174.93.15
> <mailto:%7ESushilKM at 202.174.93.15>] has quit [Ping timeout: 250
> seconds]
>
> [10:02] ==jcooley_ [~jcooley at c-76-104-157-9.hsd1.wa.comcast.net
> <mailto:%7Ejcooley at c-76-104-157-9.hsd1.wa.comcast.net>] has joined
> #openstack-meeting-alt
>
> [10:03] ==mpanetta [~mpanetta at 72.3.234.177
> <mailto:%7Empanetta at 72.3.234.177>] has joined #openstack-meeting-alt
>
> [10:03] <heyongli> aggregate can be use on pci, but it not must to
> be like this way, whitout aggregate it should still work .
>
> [10:05] ==denis_makogon [~dmakogon at 194.213.110.67
> <mailto:%7Edmakogon at 194.213.110.67>] has quit [Ping timeout: 240
> seconds]
>
> [10:05] ==flwang1 [~flwang at 106.120.178.5
> <mailto:%7Eflwang at 106.120.178.5>] has joined #openstack-meeting-alt
>
> [10:05] ==denis_makogon [~dmakogon at 194.213.110.67
> <mailto:%7Edmakogon at 194.213.110.67>] has joined #openstack-meeting-alt
>
> [10:06] ==kgriffs [~kgriffs at nexus.kgriffs.com
> <mailto:%7Ekgriffs at nexus.kgriffs.com>] has joined
> #openstack-meeting-alt
>
> [10:06] <kgriffs> o/
>
> [10:06] <amitgandhi> 0/
>
> [10:06] <kgriffs> amitgandhi: you're alive!
>
> [10:06] <flwang1> meeting time?
>
> [10:06] <flaper87> yo yo
>
> [10:06] <amitgandhi> yup made it back in one piece
>
> [10:06] <flwang1> o/
>
> [10:06] ==ametts [~ametts at 72.3.234.177
> <mailto:%7Eametts at 72.3.234.177>] has joined #openstack-meeting-alt
>
> [10:07] <kgriffs> will Malini be here today for the mtg?
>
> [10:08] <ijw> OK, we're out of time, I think we have to take this
> to the list.
>
> [10:09] <ametts> kgriffs: I see her in #cloudqueues. Just pinged her.
>
> [10:09] <ijw> To which end I've just mailed out what I was saying.
>
> On 12/17/13 10:09 AM, "Ian Wells" <ijw.ubuntu at cack.org.uk
> <mailto:ijw.ubuntu at cack.org.uk>> wrote:
>
> Reiterating from the IRC mneeting, largely, so apologies.
>
> Firstly, I disagree that
> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support
> is an accurate reflection of the current state. It's a very
> unilateral view, largely because the rest of us had been
> focussing on the google document that we've been using for weeks.
>
>
> Secondly, I totally disagree with this approach. This assumes
> that description of the (cloud-internal, hardware) details of
> each compute node is best done with data stored centrally and
> driven by an API. I don't agree with either of these points.
>
> Firstly, the best place to describe what's available on a
> compute node is in the configuration on the compute node. For
> instance, I describe which interfaces do what in Neutron on
> the compute node. This is because when you're provisioning
> nodes, that's the moment you know how you've attached it to
> the network and what hardware you've put in it and what you
> intend the hardware to be for - or conversely your deployment
> puppet or chef or whatever knows it, and Razor or MAAS has
> enumerated it, but the activities are equivalent. Storing it
> centrally distances the compute node from its descriptive
> information for no good purpose that I can see and adds the
> complexity of having to go make remote requests just to start up.
>
> Secondly, even if you did store this centrally, it's not clear
> to me that an API is very useful. As far as I can see, the
> need for an API is really the need to manage PCI device
> flavors. If you want that to be API-managed, then the rest of
> a (rather complex) API cascades from that one choice. Most of
> the things that API lets you change (expressions describing
> PCI devices) are the sort of thing that you set once and only
> revisit when you start - for instance - deploying new hosts in
> a different way.
>
> I at the parallel in Neutron provider networks. They're config
> driven, largely on the compute hosts. Agents know what ports
> on their machine (the hardware tie) are associated with
> provider networks, by provider network name. The controller
> takes 'neutron net-create ... --provider:network 'name'' and
> uses that to tie a virtual network to the provider network
> definition on each host. What we absolutely don't do is have a
> complex admin API that lets us say 'in host aggregate 4,
> provider network x (which I made earlier) is connected to eth6'.
>
> --
>
> Ian.
>
> On 17 December 2013 03:12, yongli he <yongli.he at intel.com
> <mailto:yongli.he at intel.com>> wrote:
>
> On 2013年12月16日22:27, Robert Li (baoli) wrote:
>
> Hi Yongli,
>
> The IRC meeting we have for PCI-Passthrough is the forum
> for discussion on SR-IOV support in openstack. I think the
> goal is to come up with a plan on both the nova and
> neutron side in support of the SR-IOV, and the current
> focus is on the nova side. Since you've done a lot of work
> on it already, would you like to lead tomorrow's
> discussion at UTC 1400?
>
>
> Robert , you lead the meeting very well i enjoy you setup
> every for us, keep going on it -:)
>
> I'd like to give you guy a summary of current state, let's
> discuss it then.
> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support
>
>
> 1) fade out alias ( i think this ok for all)
> 2) white list became pic-flavor ( i think this ok for all)
> 3) address simply regular expression support: only * and a
> number range is support [hex-hex]. ( i think this ok?)
> 4) aggregate : now it's clear enough, and won't impact SRIOV.
> ( i think this irrelevant to SRIOV now)
>
>
> 5) SRIOV use case, if you suggest a use case, please given a
> full example like this: [discuss: compare to other solution]
>
>
> •create a pci flavor for the SRIOV
>
> nova pci-flavor-create name 'vlan-SRIOV' description "xxxxx"
>
> nova pci-flavor-update UUID set 'description'='xxxx' 'address'= '0000:01:*.7'
>
>
> Admin config SRIOV
>
> •create pci-flavor :
>
> {"name": "privateNIC", "neutron-network-uuid": "uuid-1", ...}
>
> {"name": "publicNIC", "neutron-network-uuid": "uuid-2", ...}
>
> {"name": "smallGPU", "neutron-network-uuid": "", ...}
>
> •set aggregate meta according the flavors existed in the hosts
>
> flavor extra-specs, for a VM that gets two small GPUs and VIFs
> attached from the above SRIOV NICs:
>
> nova aggregate-set-metadata pci-aware-group set 'pci-flavor'='smallGPU,oldGPU, privateNIC,privateNIC'
>
> •create instance flavor for sriov
>
> nova flavor-key 100 set 'pci-flavor='1:privateNIC; 1: publicNIC; 2:smallGPU,oldGPU'
>
> •User just specifies a quantum port as normal:
>
> nova boot --flavor "sriov-plus-two-gpu" --image img --nic net-id=uuid-2 --nic net-id=uuid-1 vm-name
>
>
>
> Yongli
>
>
>
> Thanks,
>
> Robert
>
> On 12/11/13 8:09 PM, "He, Yongli" <yongli.he at intel.com
> <mailto:yongli.he at intel.com>> wrote:
>
> Hi, all
>
> Please continue to foucs on the blueprint, it change after
> reviewing. And for this point:
>
>
> >5. flavor style for sriov: i just list the flavor style
> in the design but for the style
> > --nic
> > --pci-flavor PowerfullNIC:1
> > still possible to work, so what's the real impact to
> sriov from the flavor design?
>
> >As you can see from the log, Irena has some strong opinions on this, and I tend to
> agree with her. The problem we need to solve is this: we
> need a means to associate a nic (or port) with a PCI
> device that is allocated out of a PCI >flavor or a PCI
> group. We think that we presented a complete solution in
> our google doc.
>
> It’s not so clear, could you please list the key point
> here. Btw, the blue print I sent Monday had changed for
> this, please check.
>
> Yongli he
>
> *From:*Robert Li (baoli) [mailto:baoli at cisco.com]
> *Sent:* Wednesday, December 11, 2013 10:18 PM
> *To:* He, Yongli; Sandhya Dasu (sadasu); OpenStack
> Development Mailing List (not for usage questions); Jiang,
> Yunhong; Irena Berezovsky; prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>;
> chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>; Itzik Brown;
> john at johngarbutt.com <mailto:john at johngarbutt.com>
> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI
> pass-through network support
>
> Hi Yongli,
>
> Thank you very much for sharing the Wiki with us on Monday
> so that we have a better understanding on your ideas and
> thoughts. Please see embedded comments.
>
> --Robert
>
> On 12/10/13 8:35 PM, "yongli he" <yongli.he at intel.com
> <mailto:yongli.he at intel.com>> wrote:
>
> On 2013年12月10日 22:41, Sandhya Dasu (sadasu) wrote:
>
> Hi,
>
> I am trying to resurrect this email thread since
> discussions have split between several threads and
> is becoming hard to keep track.
>
> An update:
>
> New PCI Passthrough meeting time: Tuesdays UTC 1400.
>
> New PCI flavor proposal from Nova:
>
> https://wiki.openstack.org/wiki/PCI_configration_Database_and_API#Take_advantage_of_host_aggregate_.28T.B.D.29
>
> Hi, all
> sorry for miss the meeting, i was seeking John at that
> time. from the log i saw some concern about new
> design, i list them there and try to clarify it per my
> opinion:
>
> 1. configuration going to deprecated: this might
> impact SRIOV. if possible, please list what kind of
> impact make to you.
>
> Regarding the nova API pci-flavor-update, we had a
> face-to-face discussion over use of a nova API to
> provision/define/configure PCI passthrough list during the
> ice-house summit. I kind of like the idea initially. As
> you can see from the meeting log, however, I later thought
> that in a distributed system, using a centralized API to
> define resources per compute node, which could come and go
> any time, doesn't seem to provide any significant benefit.
> This is the reason that I didn't mention it in our google
> doc
> https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs/edit#
> <https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs/edit>
>
> If you agree that pci-flavor and pci-group is kind of the
> same thing, then we agree with you that the
> pci-flavor-create API is needed. Since pci-flavor or
> pci-group is global, then such an API can be used for
> resource registration/validation on nova server. In
> addition, it can be used to facilitate the display of PCI
> devices per node, per group, or in the entire cloud, etc.
>
>
>
> 2. <baoli>So the API seems to be combining the
> whitelist + pci-group
> yeah, it's actually almost same thing, 'flavor'
> 'pci-group' or 'group'. the real different is this
> flavor going to deprecated the alias, and combine
> tight to aggregate or flavor.
>
> Well, with pci-group, we recommended to deprecate the PCI
> alias because we think it is redundant.
>
> We think that specification of PCI requirement in the
> flavor's extra spec is still needed as it's a generic
> means to allocate PCI devices. In addition, it can be used
> as properties in the host aggregate as well.
>
>
>
> 3. feature:
> this design is not to say the feature is not work, but
> changed. if auto discovery feature is possible, we got
> 'feature' form the device, then use the feature to
> define the pci-flavor. it's also possible create
> default pci-flavor for this. so the feature concept
> will be impact, my feeling, we should given a
> separated bp for feature, and not in this round
> change, so here we only thing is keep the feature is
> possible.
>
> I think that it's ok to have separate BPs. But we think
> that auto discovery is an essential part of the design,
> and therefore it should be implemented with more helping
> hands.
>
>
>
> 4. address regular expression: i'm fine with the
> wild-match style.
>
> Sounds good. One side node is that I noticed that the
> driver for intel 82576 cards has a strange slot assignment
> scheme. So the final definition of it may need to
> accommodate that as well.
>
>
>
> 5. flavor style for sriov: i just list the flavor
> style in the design but for the style
> --nic
> --pci-flavor PowerfullNIC:1
> still possible to work, so what's the real impact to
> sriov from the flavor design?
>
> As you can see from the log, Irena has some strong
> opinions on this, and I tend to agree with her. The
> problem we need to solve is this: we need a means to
> associate a nic (or port) with a PCI device that is
> allocated out of a PCI flavor or a PCI group. We think
> that we presented a complete solution in our google doc.
>
> At this point, I really believe that we should combine our
> efforts and ideas. As far as how many BPs are needed, it
> should be a trivial matter after we have agreed on a
> complete solution.
>
>
>
> Yongli He
>
>
> Thanks,
>
> Sandhya
>
> *From: *Sandhya Dasu <sadasu at cisco.com
> <mailto:sadasu at cisco.com>>
> *Reply-To: *"OpenStack Development Mailing List (not
> for usage questions)"
> <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>
> *Date: *Thursday, November 7, 2013 9:44 PM
> *To: *"OpenStack Development Mailing List (not for
> usage questions)" <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>, "Jiang,
> Yunhong" <yunhong.jiang at intel.com
> <mailto:yunhong.jiang at intel.com>>, "Robert Li (baoli)"
> <baoli at cisco.com <mailto:baoli at cisco.com>>, Irena
> Berezovsky <irenab at mellanox.com
> <mailto:irenab at mellanox.com>>,
> "prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>"
> <prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>>,
> "chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>"
> <chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>>, "He, Yongli"
> <yongli.he at intel.com <mailto:yongli.he at intel.com>>,
> Itzik Brown <ItzikB at mellanox.com
> <mailto:ItzikB at mellanox.com>>
> *Subject: *Re: [openstack-dev] [nova] [neutron] PCI
> pass-through network support
>
> Hi,
>
> The discussions during the summit were very
> productive. Now, we are ready to setup our IRC meeting.
>
> Here are some slots that look like they might work for us.
>
> 1. Wed 2 – 3 pm UTC.
>
> 2. Thursday 12 – 1 pm UTC.
>
> 3. Thursday 7 – 8pm UTC.
>
> Please vote.
>
> Thanks,
>
> Sandhya
>
> *From: *Sandhya Dasu <sadasu at cisco.com
> <mailto:sadasu at cisco.com>>
> *Reply-To: *"OpenStack Development Mailing List (not
> for usage questions)"
> <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>
> *Date: *Tuesday, November 5, 2013 12:03 PM
> *To: *"OpenStack Development Mailing List (not for
> usage questions)" <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>, "Jiang,
> Yunhong" <yunhong.jiang at intel.com
> <mailto:yunhong.jiang at intel.com>>, "Robert Li (baoli)"
> <baoli at cisco.com <mailto:baoli at cisco.com>>, Irena
> Berezovsky <irenab at mellanox.com
> <mailto:irenab at mellanox.com>>,
> "prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>"
> <prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>>,
> "chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>"
> <chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>>, "He, Yongli"
> <yongli.he at intel.com <mailto:yongli.he at intel.com>>,
> Itzik Brown <ItzikB at mellanox.com
> <mailto:ItzikB at mellanox.com>>
> *Subject: *Re: [openstack-dev] [nova] [neutron] PCI
> pass-through network support
>
> Just to clarify, the discussion is planned for 10 AM
> Wednesday morning at the developer's lounge.
>
> Thanks,
>
> Sandhya
>
> *From: *Sandhya Dasu <sadasu at cisco.com
> <mailto:sadasu at cisco.com>>
> *Reply-To: *"OpenStack Development Mailing List (not
> for usage questions)"
> <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>
> *Date: *Tuesday, November 5, 2013 11:38 AM
> *To: *"OpenStack Development Mailing List (not for
> usage questions)" <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>, "Jiang,
> Yunhong" <yunhong.jiang at intel.com
> <mailto:yunhong.jiang at intel.com>>, "Robert Li (baoli)"
> <baoli at cisco.com <mailto:baoli at cisco.com>>, Irena
> Berezovsky <irenab at mellanox.com
> <mailto:irenab at mellanox.com>>,
> "prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>"
> <prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>>,
> "chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>"
> <chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>>, "He, Yongli"
> <yongli.he at intel.com <mailto:yongli.he at intel.com>>,
> Itzik Brown <ItzikB at mellanox.com
> <mailto:ItzikB at mellanox.com>>
> *Subject: *Re: [openstack-dev] [nova] [neutron] PCI
> pass-through network support
>
> *Hi,*
>
> *We are planning to have a discussion at the
> developer's lounge tomorrow morning at 10:00 am.
> Please feel free to drop by if you are interested.*
>
> *Thanks,*
>
> *Sandhya*
>
> *From: *<Jiang>, Yunhong <yunhong.jiang at intel.com
> <mailto:yunhong.jiang at intel.com>>
>
> *Date: *Thursday, October 31, 2013 6:21 PM
> *To: *"Robert Li (baoli)" <baoli at cisco.com
> <mailto:baoli at cisco.com>>, Irena Berezovsky
> <irenab at mellanox.com <mailto:irenab at mellanox.com>>,
> "prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>"
> <prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>>,
> "chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>"
> <chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>>, "He, Yongli"
> <yongli.he at intel.com <mailto:yongli.he at intel.com>>,
> Itzik Brown <ItzikB at mellanox.com
> <mailto:ItzikB at mellanox.com>>
> *Cc: *OpenStack Development Mailing List
> <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>, "Brian
> Bowen (brbowen)" <brbowen at cisco.com
> <mailto:brbowen at cisco.com>>, "Kyle Mestery (kmestery)"
> <kmestery at cisco.com <mailto:kmestery at cisco.com>>,
> Sandhya Dasu <sadasu at cisco.com <mailto:sadasu at cisco.com>>
> *Subject: *RE: [openstack-dev] [nova] [neutron] PCI
> pass-through network support
>
> Robert, I think your change request for pci alias
> should be covered by the extra infor enhancement.
> https://blueprints.launchpad.net/nova/+spec/pci-extra-info
> and Yongli is working on it.
>
> I’m not sure how the port profile is passed to the
> connected switch, is it a Cisco VMEFX specific method
> or libvirt method? Sorry I’m not well on network side.
>
> --jyh
>
> *From:*Robert Li (baoli) [mailto:baoli at cisco.com]
> *Sent:* Wednesday, October 30, 2013 10:13 AM
> *To:* Irena Berezovsky; Jiang, Yunhong;
> prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>;
> chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>; He, Yongli;
> Itzik Brown
> *Cc:* OpenStack Development Mailing List; Brian Bowen
> (brbowen); Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI
> pass-through network support
>
> Hi,
>
> Regarding physical network mapping, This is what I
> thought.
>
> consider the following scenarios:
>
> 1. a compute node with SRIOV only interfaces attached
> to a physical network. the node is connected to one
> upstream switch
>
> 2. a compute node with both SRIOV interfaces and
> non-SRIOV interfaces attached to a physical network.
> the node is connected to one upstream switch
>
> 3. in addition to case 1 &2, a compute node may have
> multiple vNICs that are connected to different
> upstream switches.
>
> CASE 1:
>
> -- the mapping from a virtual network (in terms of
> neutron) to a physical network is actually done by
> binding a port profile to a neutron port. With cisco's
> VM-FEX, a port profile is associated with one or
> multiple vlans. Once the neutron port is bound with
> this port-profile in the upstream switch, it's
> effectively plugged into the physical network.
>
> -- since the compute node is connected to one upstream
> switch, the existing nova PCI alias will be
> sufficient. For example, one can boot a Nova instance
> that is attached to a SRIOV port with the following
> command:
>
> nova boot —flavor m1.large —image <image-id> --nic
> net-id=<net>,pci-alias=<alias>,sriov=<direct|macvtap>,port-profile=<profile>
>
> the net-id will be useful in terms of allocating IP
> address, enable dhcp, etc that is associated with the
> network.
>
> -- the pci-alias specified in the nova boot command is
> used to create a PCI request for scheduling purpose. a
> PCI device is bound to a neutron port during the
> instance build time in the case of nova boot. Before
> invoking the neutron API to create a port, an
> allocated PCI device out of a PCI alias will be
> located from the PCI device list object. This device
> info among other information will be sent to neutron
> to create the port.
>
> CASE 2:
>
> -- Assume that OVS is used for the non-SRIOV
> interfaces. An example of configuration with ovs
> plugin would look like:
>
> bridge_mappings = physnet1:br-vmfex
>
> network_vlan_ranges = physnet1:15:17
>
> tenant_network_type = vlan
>
> When a neutron network is created, a vlan is either
> allocated or specified in the neutron net-create
> command. Attaching a physical interface to the bridge
> (in the above example br-vmfex) is an administrative
> task.
>
> -- to create a Nova instance with non-SRIOV port:
>
> nova boot —flavor m1.large —image <image-id> --nic
> net-id=<net>
>
> -- to create a Nova instance with SRIOV port:
>
> nova boot —flavor m1.large —image <image-id> --nic
> net-id=<net>,pci-alias=<alias>,sriov=<direct|macvtap>,port-profile=<profile>
>
> it's essentially the same as in the first case. But
> since the net-id is already associated with a vlan,
> the vlan associated with the port-profile must be
> identical to that vlan. This has to be enforced by
> neutron.
>
> again, since the node is connected to one upstream
> switch, the existing nova PCI alias should be sufficient.
>
> CASE 3:
>
> -- A compute node might be connected to multiple
> upstream switches, with each being a separate network.
> This means SRIOV PFs/VFs are already implicitly
> associated with physical networks. In the none-SRIOV
> case, a physical interface is associated with a
> physical network by plugging it into that network, and
> attaching this interface to the ovs bridge that
> represents this physical network on the compute node.
> In the SRIOV case, we need a way to group the SRIOV
> VFs that belong to the same physical networks. The
> existing nova PCI alias is to facilitate PCI device
> allocation by associating <product_id, vendor_id> with
> an alias name. This will no longer be sufficient. But
> it can be enhanced to achieve our goal. For example,
> the PCI device domain, bus (if their mapping to vNIC
> is fixed across boot) may be added into the alias, and
> the alias name should be corresponding to a list of
> tuples.
>
> Another consideration is that a VF or PF might be used
> on the host for other purposes. For example, it's
> possible for a neutron DHCP server to be bound with a
> VF. Therefore, there needs a method to exclude some
> VFs from a group. One way is to associate an exclude
> list with an alias.
>
> The enhanced PCI alias can be used to support features
> other than neutron as well. Essentially, a PCI alias
> can be defined as a group of PCI devices associated
> with a feature. I'd think that this should be
> addressed with a separate blueprint.
>
> Thanks,
>
> Robert
>
> On 10/30/13 12:59 AM, "Irena Berezovsky"
> <irenab at mellanox.com <mailto:irenab at mellanox.com>> wrote:
>
> Hi,
>
> Please see my answers inline
>
> *From:*Jiang, Yunhong
> [mailto:yunhong.jiang at intel.com]
> *Sent:* Tuesday, October 29, 2013 10:17 PM
> *To:* Irena Berezovsky; Robert Li (baoli);
> prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>;
> chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>; He, Yongli;
> Itzik Brown
> *Cc:* OpenStack Development Mailing List; Brian
> Bowen (brbowen); Kyle Mestery (kmestery); Sandhya
> Dasu (sadasu)
> *Subject:* RE: [openstack-dev] [nova] [neutron]
> PCI pass-through network support
>
> Your explanation of the virtual network and
> physical network is quite clear and should work
> well. We need change nova code to achieve it,
> including get the physical network for the virtual
> network, passing the physical network requirement
> to the filter properties etc.
>
> */[IrenaB] /*The physical network is already
> available to nova at networking/nova/api at as
> virtual network attribute, it then passed to the
> VIF driver. We will push soon the fix
> to:https://bugs.launchpad.net/nova/+bug/1239606;
> which will provide general support for getting
> this information.
>
> For your port method, so you mean we are sure to
> passing network id to ‘nova boot’ and nova will
> create the port during VM boot, am I right? Also,
> how can nova knows that it need allocate the PCI
> device for the port? I’d suppose that in SR-IOV
> NIC environment, user don’t need specify the PCI
> requirement. Instead, the PCI requirement should
> come from the network configuration and image
> property. Or you think user still need passing
> flavor with pci request?
>
> */[IrenaB] There are two way to apply port method.
> One is to pass network id on nova boot and use
> default type as chosen in the neutron config file
> for vnic type. Other way is to define port with
> required vnic type and other properties if
> applicable, and run ‘nova boot’ with port id
> argument. Going forward with nova support for PCI
> devices awareness, we do need a way impact
> scheduler choice to land VM on suitable Host with
> available PC device that has the required
> connectivity./*
>
> --jyh
>
> *From:*Irena Berezovsky [mailto:irenab at mellanox.com]
> *Sent:* Tuesday, October 29, 2013 3:17 AM
> *To:* Jiang, Yunhong; Robert Li (baoli);
> prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>;
> chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>; He, Yongli;
> Itzik Brown
> *Cc:* OpenStack Development Mailing List; Brian
> Bowen (brbowen); Kyle Mestery (kmestery); Sandhya
> Dasu (sadasu)
> *Subject:* RE: [openstack-dev] [nova] [neutron]
> PCI pass-through network support
>
> Hi Jiang, Robert,
>
> IRC meeting option works for me.
>
> If I understand your question below, you are
> looking for a way to tie up between requested
> virtual network(s) and requested PCI device(s).
> The way we did it in our solution is to map a
> provider:physical_network to an interface that
> represents the Physical Function. Every virtual
> network is bound to the provider:physical_network,
> so the PCI device should be allocated based on
> this mapping. We can map a PCI alias to the
> provider:physical_network.
>
> Another topic to discuss is where the mapping
> between neutron port and PCI device should be
> managed. One way to solve it, is to propagate the
> allocated PCI device details to neutron on port
> creation.
>
> In case there is no qbg/qbh support, VF networking
> configuration should be applied locally on the Host.
>
> The question is when and how to apply networking
> configuration on the PCI device?
>
> We see the following options:
>
> •it can be done on port creation.
>
> •It can be done when nova VIF driver is called for
> vNIC plugging. This will require to have all
> networking configuration available to the VIF
> driver or send request to the neutron server to
> obtain it.
>
> •It can be done by having a dedicated L2 neutron
> agent on each Host that scans for allocated PCI
> devices and then retrieves networking
> configuration from the server and configures the
> device. The agent will be also responsible for
> managing update requests coming from the neutron
> server.
>
> For macvtap vNIC type assignment, the networking
> configuration can be applied by a dedicated L2
> neutron agent.
>
> BR,
>
> Irena
>
> *From:*Jiang, Yunhong
> [mailto:yunhong.jiang at intel.com]
> *Sent:* Tuesday, October 29, 2013 9:04 AM
>
>
> *To:* Robert Li (baoli); Irena Berezovsky;
> prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>;
> chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>; He, Yongli;
> Itzik Brown
> *Cc:* OpenStack Development Mailing List; Brian
> Bowen (brbowen); Kyle Mestery (kmestery); Sandhya
> Dasu (sadasu)
> *Subject:* RE: [openstack-dev] [nova] [neutron]
> PCI pass-through network support
>
> Robert, is it possible to have a IRC meeting? I’d
> prefer to IRC meeting because it’s more openstack
> style and also can keep the minutes clearly.
>
> To your flow, can you give more detailed example.
> For example, I can consider user specify the
> instance with –nic option specify a network id,
> and then how nova device the requirement to the
> PCI device? I assume the network id should define
> the switches that the device can connect to , but
> how is that information translated to the PCI
> property requirement? Will this translation happen
> before the nova scheduler make host decision?
>
> Thanks
>
> --jyh
>
> *From:*Robert Li (baoli) [mailto:baoli at cisco.com]
> *Sent:* Monday, October 28, 2013 12:22 PM
> *To:* Irena Berezovsky;
> prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>; Jiang,
> Yunhong; chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>; He, Yongli;
> Itzik Brown
> *Cc:* OpenStack Development Mailing List; Brian
> Bowen (brbowen); Kyle Mestery (kmestery); Sandhya
> Dasu (sadasu)
> *Subject:* Re: [openstack-dev] [nova] [neutron]
> PCI pass-through network support
>
> Hi Irena,
>
> Thank you very much for your comments. See inline.
>
> --Robert
>
> On 10/27/13 3:48 AM, "Irena Berezovsky"
> <irenab at mellanox.com <mailto:irenab at mellanox.com>>
> wrote:
>
> Hi Robert,
>
> Thank you very much for sharing the
> information regarding your efforts. Can you
> please share your idea of the end to end flow?
> How do you suggest to bind Nova and Neutron?
>
> The end to end flow is actually encompassed in the
> blueprints in a nutshell. I will reiterate it in
> below. The binding between Nova and Neutron occurs
> with the neutron v2 API that nova invokes in order
> to provision the neutron services. The vif driver
> is responsible for plugging in an instance onto
> the networking setup that neutron has created on
> the host.
>
> Normally, one will invoke "nova boot" api with the
> —nic options to specify the nic with which the
> instance will be connected to the network. It
> currently allows net-id, fixed ip and/or port-id
> to be specified for the option. However, it
> doesn't allow one to specify special networking
> requirements for the instance. Thanks to the nova
> pci-passthrough work, one can specify PCI
> passthrough device(s) in the nova flavor. But it
> doesn't provide means to tie up these PCI devices
> in the case of ethernet adpators with networking
> services. Therefore the idea is actually simple as
> indicated by the blueprint titles, to provide
> means to tie up SRIOV devices with neutron
> services. A work flow would roughly look like this
> for 'nova boot':
>
> -- Specifies networking requirements in the —nic
> option. Specifically for SRIOV, allow the
> following to be specified in addition to the
> existing required information:
>
> . PCI alias
>
> . direct pci-passthrough/macvtap
>
> . port profileid that is compliant with 802.1Qbh
>
> The above information is optional. In the absence
> of them, the existing behavior remains.
>
> -- if special networking requirements exist, Nova
> api creates PCI requests in the nova instance type
> for scheduling purpose
>
> -- Nova scheduler schedules the instance based on
> the requested flavor plus the PCI requests that
> are created for networking.
>
> -- Nova compute invokes neutron services with PCI
> passthrough information if any
>
> -- Neutron performs its normal operations based on
> the request, such as allocating a port, assigning
> ip addresses, etc. Specific to SRIOV, it should
> validate the information such as profileid, and
> stores them in its db. It's also possible to
> associate a port profileid with a neutron network
> so that port profileid becomes optional in the
> —nic option. Neutron returns nova the port
> information, especially for PCI passthrough
> related information in the port binding object.
> Currently, the port binding object contains the
> following information:
>
> binding:vif_type
>
> binding:host_id
>
> binding:profile
>
> binding:capabilities
>
> -- nova constructs the domain xml and plug in the
> instance by calling the vif driver. The vif driver
> can build up the interface xml based on the port
> binding information.
>
> The blueprints you registered make sense. On
> Nova side, there is a need to bind between
> requested virtual network and PCI
> device/interface to be allocated as vNIC.
>
> On the Neutron side, there is a need to
> support networking configuration of the vNIC.
> Neutron should be able to identify the PCI
> device/macvtap interface in order to apply
> configuration. I think it makes sense to
> provide neutron integration via dedicated
> Modular Layer 2 Mechanism Driver to allow PCI
> pass-through vNIC support along with other
> networking technologies.
>
> I haven't sorted through this yet. A neutron port
> could be associated with a PCI device or not,
> which is a common feature, IMHO. However, a ML2
> driver may be needed specific to a particular
> SRIOV technology.
>
> During the Havana Release, we introduced
> Mellanox Neutron plugin that enables
> networking via SRIOV pass-through devices or
> macvtap interfaces.
>
> We want to integrate our solution with PCI
> pass-through Nova support. I will be glad to
> share more details if you are interested.
>
> Good to know that you already have a SRIOV
> implementation. I found out some information
> online about the mlnx plugin, but need more time
> to get to know it better. And certainly I'm
> interested in knowing its details.
>
> The PCI pass-through networking support is
> planned to be discussed during the summit:
> http://summit.openstack.org/cfp/details/129.I
> think it’s worth to drill down into more
> detailed proposal and present it during the
> summit, especially since it impacts both nova
> and neutron projects.
>
> I agree. Maybe we can steal some time in that
> discussion.
>
> Would you be interested in collaboration on
> this effort? Would you be interested to
> exchange more emails or set an IRC/WebEx
> meeting during this week before the summit?
>
> Sure. If folks want to discuss it before the
> summit, we can schedule a webex later this week.
> Or otherwise, we can continue the discussion with
> email.
>
> Regards,
>
> Irena
>
> *From:*Robert Li (baoli) [mailto:baoli at cisco.com]
> *Sent:* Friday, October 25, 2013 11:16 PM
> *To:* prashant.upadhyaya at aricent.com
> <mailto:prashant.upadhyaya at aricent.com>; Irena
> Berezovsky; yunhong.jiang at intel.com
> <mailto:yunhong.jiang at intel.com>;
> chris.friesen at windriver.com
> <mailto:chris.friesen at windriver.com>;
> yongli.he at intel.com <mailto:yongli.he at intel.com>
> *Cc:* OpenStack Development Mailing List;
> Brian Bowen (brbowen); Kyle Mestery
> (kmestery); Sandhya Dasu (sadasu)
> *Subject:* Re: [openstack-dev] [nova]
> [neutron] PCI pass-through network support
>
> Hi Irena,
>
> This is Robert Li from Cisco Systems.
> Recently, I was tasked to investigate such
> support for Cisco's systems that support
> VM-FEX, which is a SRIOV technology supporting
> 802-1Qbh. I was able to bring up nova
> instances with SRIOV interfaces, and establish
> networking in between the instances that
> employes the SRIOV interfaces. Certainly, this
> was accomplished with hacking and some manual
> intervention. Based on this experience and my
> study with the two existing nova
> pci-passthrough blueprints that have been
> implemented and committed into Havana
> (https://blueprints.launchpad.net/nova/+spec/pci-passthrough-baseand
> https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt),
> I registered a couple of blueprints (one on
> Nova side, the other on the Neutron side):
>
> https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
>
> https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov
>
> in order to address SRIOV support in openstack.
>
> Please take a look at them and see if they
> make sense, and let me know any comments and
> questions. We can also discuss this in the
> summit, I suppose.
>
> I noticed that there is another thread on this
> topic, so copy those folks from that thread as
> well.
>
> thanks,
>
> Robert
>
> On 10/16/13 4:32 PM, "Irena Berezovsky"
> <irenab at mellanox.com
> <mailto:irenab at mellanox.com>> wrote:
>
> Hi,
>
> As one of the next steps for PCI
> pass-through I would like to discuss is
> the support for PCI pass-through vNIC.
>
> While nova takes care of PCI pass-through
> device resources management and VIF
> settings, neutron should manage their
> networking configuration.
>
> I would like to register asummit proposal
> to discuss the support for PCI
> pass-through networking.
>
> I am not sure what would be the right
> topic to discuss the PCI pass-through
> networking, since it involve both nova and
> neutron.
>
> There is already a session registered by
> Yongli on nova topic to discuss the PCI
> pass-through next steps.
>
> I think PCI pass-through networking is
> quite a big topic and it worth to have a
> separate discussion.
>
> Is there any other people who are
> interested to discuss it and share their
> thoughts and experience?
>
> Regards,
>
> Irena
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> <mailto:OpenStack-dev at lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131224/5cb2ff1a/attachment-0001.html>
More information about the OpenStack-dev
mailing list