sriov bonding

Ian Wells ijw.ubuntu at
Sun Mar 3 01:21:57 UTC 2019

On Fri, 1 Mar 2019 at 05:24, Sean Mooney <smooney at> wrote:

> On Fri, 2019-03-01 at 01:59 +0000, Manuel Sopena Ballesteros wrote:
> > Ok,
> >
> > My nic is mellanox connect-4 lx so I was thinking this:
> > Create bonding at the PF level
> > use bond interface as br-ext
> > Do ovs offload
> >
> > That would mean I don't need to change my switch configuration and keep
> my LACP.
> >
> > Does it sounds feasible?
> i think mellanox would be able to comment better but
> i dont think that will work

The whole thing becomes a problem with L2 interfaces that you get out of
SRIOV, and so we need to define what it is that we're actually looking for

L2 (typically LACP) bonding is conventionally between sets of ports that
are directly connected with a wire.  But in the environment we're talking
about here, we're looking at your TOR switches (with bonding config) -> a
really dumb onboard switch on the NIC -> your VM.  You want your VM to run
a bonding protocol, because you want it to know when its ports are working
and which ports to use.  You want the TOR to be the thing that it's talking
to, typically, because you're typically talking to something off-host and
it's really the TOR uplinks' health you're interested in.  But your nearest
neighbours are ports on the internal switches of two entirely unrelated
SRIOV NICs that can't talk LACP to you.  And even if they could LACP
wouldn't tell you anything other than the fact that that connection was
broken - which it won't be, because the likely source of failure is one of
the TORs.

You *could* talk LACP to the TORs and pretend that intervening switch
doesn't exist, and that's where Manuel's logic comes in - the PF talks
bonding protocols (and pretends it hasn't noticed that intervening switch)
and the VFs don't talk bonding protocols (and yet, pretend that they're
bonded - which is a weird config).  But do you take the VFs down when the
PF link drops?  If you do, then perfectly well connected local VMs can't
talk to each other; and if you don't, then the VM never knows the uplink is
broken.  Normally you'd have two  links, one to both TORs, so one link or
TOR going down wouldn't break things - but that dumb switch on the NIC has
only one uplink, so no go.

And in all these cases you need, somehow, for the VM to know what it's been
given - one half of a bonded pair of SRIOV interfaces is not a great deal
of use to anyone, so you have to be careful to both give them what they
need (bonded or unbonded) and somehow communicate that fact to them.

There are, indeed, other options for running link bonding.  One is that you
can run bonds from a vswitch to the TOR, either LACP on a L2 link or
ECMP+BFD on an L3 link like VXLAN - the latter is better by far,
incidentally, as links come down in milliseconds and the timing for LACP is
much slower.  That just works - networking-vpp will do it just fine - but
if you're using SRIOV and you're using it for reasons of performance you
may not want to go anywhere near a vswitch, no matter how fast.

There's also a model for using ECMP directly to the VMs, which doesn't
require the SRIOV ports to be bonded at L2 and doesn't necessarily involve
Neutron's help to work.  That might be another avenue to explore, but
you're into grown-up networking at that point.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the openstack-discuss mailing list