[openstack-dev] [nova] [neutron] PCI pass-through network support

Ian Wells ijw.ubuntu at cack.org.uk
Tue Jan 21 10:46:25 UTC 2014

Document updated to talk about network aware scheduling (
section just before the use case list).

Based on yesterday's meeting, rkukura would also like to see network-aware
scheduling to work for non-PCI cases - where servers are not necessarily
connected to every physical segment and machines therefore need placing
based on where they can reach the networks they need.  I think this is an
exact parallel to the PCI case, except that we're also constrained by a
count of resources (you can connect an infinite number of VMs to a software
bridge, of course).  We should implement the scheduling changes as a
separate batch of work that solves both problems, if we can - and this
works with the two step approach, because step 1 brings us up to Neutron
parity and step 2 will add network-aware scheduling for both PCI and
non-PCI cases.


On 20 January 2014 13:38, Ian Wells <ijw.ubuntu at cack.org.uk> wrote:

> On 20 January 2014 09:28, Irena Berezovsky <irenab at mellanox.com> wrote:
>> Hi,
>> Having post PCI meeting discussion with Ian based on his proposal
>> https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#
>> ,
>> I am  not sure that the case that quite usable for SR-IOV based
>> networking is covered well by this proposal. The understanding I got is
>> that VM can land on the Host that will lack suitable PCI resource.
> The issue we have is if we have multiple underlying networks in the system
> and only some Neutron networks are trunked on the network that the PCI
> device is attached to.  This can specifically happen in the case of
> provider versus trunk networks, though it's very dependent on the setup of
> your system.
> The issue is that, in the design we have, Neutron at present has no input
> into scheduling, and also that all devices in a flavor are precisely
> equivalent.  So if I say 'I want a 10G card attached to network X' I will
> get one of the cases in the 10G flavor with no regard as to whether it can
> actually attach to network X.
> I can see two options here:
> 1. What I'd do right now is I would make it so that a VM that is given an
> unsuitable network card fails to run in nova-compute when Neutorn discovers
> it can't attach the PCI device to the network.  This will get us a lot of
> use cases and a Neutron driver without solving the problem elegantly.
> You'd need to choose e.g. a provider or tenant network flavor, mindful of
> the network you're connecting to, so that Neutron can actually succeed,
> which is more visibility into the workings of Neutron than the user really
> ought to need.
> 2. When Nova checks that all the networks exist - which, conveniently, is
> in nova-api - it also gets attributes from the networks that can be used by
> the scheduler to choose a device.  So the scheduler chooses from a flavor
> *and*, within that flavor, from a subset of those devices with appopriate
> connectivity.  If we do this then the Neutron connection code doesn't
> change - it should still fail if the connection can't be made - but it
> becomes an internal error, since it's now an issue of consistency of
> setup.
> To do this, I think we would tell Neutron 'PCI extra-info X should be set
> to Y for this provider network and Z for tenant networks' - the precise
> implementation would be somewhat up to the driver - and then add the
> additional check in the scheduler.  The scheduling attributes list would
> have to include that attribute.
> Can you please provide an example for the required cloud admin PCI related
>> configurations on nova-compute and controller node with regards to the
>> following simplified scenario:
>>  -- There are 2 provider networks (phy1, phy2), each one has associated
>> range on vlan-ids
>>  -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature,
>> exposing xx Virtual Functions.
>>  -- Every VM vnic on virtual network on provider network  phy1 or phy2
>>  should be pci pass-through vnic.
> So, we would configure Neutron to check the 'e.physical_network' attribute
> on connection and to return it as a requirement on networks.  Any PCI on
> provider network 'phy1' would be tagged e.physical_network => 'phy1'.  When
> returning the network, an extra attribute would be supplied (perhaps
> something like 'pci_requirements => { e.physical_network => 'phy1'}'.  And
> nova-api would know that, in the case of macvtap and PCI directmap, it
> would need to pass this additional information to the scheduler which would
> need to make use of it in finding a device, over and above the flavor
> requirements.
> Neutron, when mapping a PCI port, would similarly work out from the
> Neutron network the trunk it needs to connect to, and would reject any
> mapping that didn't conform. If it did, it would work out how to
> encapsulate the traffic from the PCI device and set that up on the PF of
> the port.
> I'm not saying this is the only or best solution, but it does have the
> advantage that it keeps all of the networking behaviour in Neutron -
> hopefully Nova remains almost completely ignorant of what the network setup
> is, since the only thing we have to do is pass on PCI requirements, and we
> already have a convenient call flow we can use that's there for the network
> existence check.
> --
> Ian.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140121/8d9b4962/attachment.html>

More information about the OpenStack-dev mailing list