<div dir="ltr"><div>Document updated to talk about network aware scheduling (<a href="https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit#">https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit#</a> - section just before the use case list). <br>
<br>Based on yesterday's meeting, rkukura would also like to see network-aware scheduling to work for non-PCI cases - where servers are not necessarily connected to every physical segment and machines therefore need placing based on where they can reach the networks they need. I think this is an exact parallel to the PCI case, except that we're also constrained by a count of resources (you can connect an infinite number of VMs to a software bridge, of course). We should implement the scheduling changes as a separate batch of work that solves both problems, if we can - and this works with the two step approach, because step 1 brings us up to Neutron parity and step 2 will add network-aware scheduling for both PCI and non-PCI cases.<br>
<br>-- <br></div>Ian.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 20 January 2014 13:38, Ian Wells <span dir="ltr"><<a href="mailto:ijw.ubuntu@cack.org.uk" target="_blank">ijw.ubuntu@cack.org.uk</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="im">On 20 January 2014 09:28, Irena Berezovsky <span dir="ltr"><<a href="mailto:irenab@mellanox.com" target="_blank">irenab@mellanox.com</a>></span> wrote:<br>
</div><div class="gmail_extra"><div class="gmail_quote"><div class="im">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
Having post PCI meeting discussion with Ian based on his proposal <a href="https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#" target="_blank">https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#</a>,<br>
I am not sure that the case that quite usable for SR-IOV based networking is covered well by this proposal. The understanding I got is that VM can land on the Host that will lack suitable PCI resource.<br></blockquote><div>
<br></div></div><div>The issue we have is if we have multiple underlying networks in the system and only some Neutron networks are trunked on the network that the PCI device is attached to. This can specifically happen in the case of provider versus trunk networks, though it's very dependent on the setup of your system.<br>
<br></div><div>The issue is that, in the design we have, Neutron at present has no input into scheduling, and also that all devices in a flavor are precisely equivalent. So if I say 'I want a 10G card attached to network X' I will get one of the cases in the 10G flavor with no regard as to whether it can actually attach to network X.<br>
<br></div><div>I can see two options here:<br><br></div><div>1. What I'd do right now is I would make it so that a VM that is given an unsuitable network card fails to run in nova-compute when Neutorn discovers it can't attach the PCI device to the network. This will get us a lot of use cases and a Neutron driver without solving the problem elegantly. You'd need to choose e.g. a provider or tenant network flavor, mindful of the network you're connecting to, so that Neutron can actually succeed, which is more visibility into the workings of Neutron than the user really ought to need.<br>
<br></div><div>2. When Nova checks that all the networks exist - which, conveniently, is in nova-api - it also gets attributes from the networks that can be used by the scheduler to choose a device. So the scheduler chooses from a flavor *and*, within that flavor, from a subset of those devices with appopriate connectivity. If we do this then the Neutron connection code doesn't change - it should still fail if the connection can't be made - but it becomes an internal error, since it's now an issue of consistency of setup. <br>
<br>To do this, I think we would tell Neutron 'PCI extra-info X should be set to Y for this provider network and Z for tenant networks' - the precise implementation would be somewhat up to the driver - and then add the additional check in the scheduler. The scheduling attributes list would have to include that attribute.<br>
</div><div class="im"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Can you please provide an example for the required cloud admin PCI related configurations on nova-compute and controller node with regards to the following simplified scenario:<br>
-- There are 2 provider networks (phy1, phy2), each one has associated range on vlan-ids<br>
-- Each compute node has 2 vendor adapters with SR-IOV enabled feature, exposing xx Virtual Functions.<br>
-- Every VM vnic on virtual network on provider network phy1 or phy2 should be pci pass-through vnic.<br></blockquote><div><br></div></div><div>So, we would configure Neutron to check the 'e.physical_network' attribute on connection and to return it as a requirement on networks. Any PCI on provider network 'phy1' would be tagged e.physical_network => 'phy1'. When returning the network, an extra attribute would be supplied (perhaps something like 'pci_requirements => { e.physical_network => 'phy1'}'. And nova-api would know that, in the case of macvtap and PCI directmap, it would need to pass this additional information to the scheduler which would need to make use of it in finding a device, over and above the flavor requirements.<br>
<br>Neutron, when mapping a PCI port, would similarly work out from the Neutron network the trunk it needs to connect to, and would reject any mapping that didn't conform. If it did, it would work out how to encapsulate the traffic from the PCI device and set that up on the PF of the port.<br>
<br></div><div>I'm not saying this is the only or best solution, but it does have the advantage that it keeps all of the networking behaviour in Neutron - hopefully Nova remains almost completely ignorant of what the network setup is, since the only thing we have to do is pass on PCI requirements, and we already have a convenient call flow we can use that's there for the network existence check.<span class="HOEnZb"><font color="#888888"><br>
</font></span></div><span class="HOEnZb"><font color="#888888"><div>-- <br></div><div>Ian.<br></div></font></span></div></div></div>
</blockquote></div><br></div>