[openstack-dev] [nova] [neutron] PCI pass-through network support

Ian Wells ijw.ubuntu at cack.org.uk
Mon Jan 20 12:38:22 UTC 2014

On 20 January 2014 09:28, Irena Berezovsky <irenab at mellanox.com> wrote:

> Hi,
> Having post PCI meeting discussion with Ian based on his proposal
> https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#
> ,
> I am  not sure that the case that quite usable for SR-IOV based networking
> is covered well by this proposal. The understanding I got is that VM can
> land on the Host that will lack suitable PCI resource.

The issue we have is if we have multiple underlying networks in the system
and only some Neutron networks are trunked on the network that the PCI
device is attached to.  This can specifically happen in the case of
provider versus trunk networks, though it's very dependent on the setup of
your system.

The issue is that, in the design we have, Neutron at present has no input
into scheduling, and also that all devices in a flavor are precisely
equivalent.  So if I say 'I want a 10G card attached to network X' I will
get one of the cases in the 10G flavor with no regard as to whether it can
actually attach to network X.

I can see two options here:

1. What I'd do right now is I would make it so that a VM that is given an
unsuitable network card fails to run in nova-compute when Neutorn discovers
it can't attach the PCI device to the network.  This will get us a lot of
use cases and a Neutron driver without solving the problem elegantly.
You'd need to choose e.g. a provider or tenant network flavor, mindful of
the network you're connecting to, so that Neutron can actually succeed,
which is more visibility into the workings of Neutron than the user really
ought to need.

2. When Nova checks that all the networks exist - which, conveniently, is
in nova-api - it also gets attributes from the networks that can be used by
the scheduler to choose a device.  So the scheduler chooses from a flavor
*and*, within that flavor, from a subset of those devices with appopriate
connectivity.  If we do this then the Neutron connection code doesn't
change - it should still fail if the connection can't be made - but it
becomes an internal error, since it's now an issue of consistency of

To do this, I think we would tell Neutron 'PCI extra-info X should be set
to Y for this provider network and Z for tenant networks' - the precise
implementation would be somewhat up to the driver - and then add the
additional check in the scheduler.  The scheduling attributes list would
have to include that attribute.

Can you please provide an example for the required cloud admin PCI related
> configurations on nova-compute and controller node with regards to the
> following simplified scenario:
>  -- There are 2 provider networks (phy1, phy2), each one has associated
> range on vlan-ids
>  -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature,
> exposing xx Virtual Functions.
>  -- Every VM vnic on virtual network on provider network  phy1 or phy2
>  should be pci pass-through vnic.

So, we would configure Neutron to check the 'e.physical_network' attribute
on connection and to return it as a requirement on networks.  Any PCI on
provider network 'phy1' would be tagged e.physical_network => 'phy1'.  When
returning the network, an extra attribute would be supplied (perhaps
something like 'pci_requirements => { e.physical_network => 'phy1'}'.  And
nova-api would know that, in the case of macvtap and PCI directmap, it
would need to pass this additional information to the scheduler which would
need to make use of it in finding a device, over and above the flavor

Neutron, when mapping a PCI port, would similarly work out from the Neutron
network the trunk it needs to connect to, and would reject any mapping that
didn't conform. If it did, it would work out how to encapsulate the traffic
from the PCI device and set that up on the PF of the port.

I'm not saying this is the only or best solution, but it does have the
advantage that it keeps all of the networking behaviour in Neutron -
hopefully Nova remains almost completely ignorant of what the network setup
is, since the only thing we have to do is pass on PCI requirements, and we
already have a convenient call flow we can use that's there for the network
existence check.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140120/41060af8/attachment.html>

More information about the OpenStack-dev mailing list