On Fri, 2019-03-01 at 01:59 +0000, Manuel Sopena Ballesteros wrote:
> Ok,
>
> My nic is mellanox connect-4 lx so I was thinking this:
> Create bonding at the PF level
> use bond interface as br-ext
> Do ovs offload
>
> That would mean I don't need to change my switch configuration and keep my LACP.
>
> Does it sounds feasible?
i think mellanox would be able to comment better but
i dont think that will work
The whole thing becomes a problem with L2 interfaces that you get out of SRIOV, and so we need to define what it is that we're actually looking for here.
L2 (typically LACP) bonding is conventionally between sets of ports that are directly connected with a wire. But in the environment we're talking about here, we're looking at your TOR switches (with bonding config) -> a really dumb onboard switch on the NIC -> your VM. You want your VM to run a bonding protocol, because you want it to know when its ports are working and which ports to use. You want the TOR to be the thing that it's talking to, typically, because you're typically talking to something off-host and it's really the TOR uplinks' health you're interested in. But your nearest neighbours are ports on the internal switches of two entirely unrelated SRIOV NICs that can't talk LACP to you. And even if they could LACP wouldn't tell you anything other than the fact that that connection was broken - which it won't be, because the likely source of failure is one of the TORs.
You *could* talk LACP to the TORs and pretend that intervening switch doesn't exist, and that's where Manuel's logic comes in - the PF talks bonding protocols (and pretends it hasn't noticed that intervening switch) and the VFs don't talk bonding protocols (and yet, pretend that they're bonded - which is a weird config). But do you take the VFs down when the PF link drops? If you do, then perfectly well connected local VMs can't talk to each other; and if you don't, then the VM never knows the uplink is broken. Normally you'd have two links, one to both TORs, so one link or TOR going down wouldn't break things - but that dumb switch on the NIC has only one uplink, so no go.
And in all these cases you need, somehow, for the VM to know what it's been given - one half of a bonded pair of SRIOV interfaces is not a great deal of use to anyone, so you have to be careful to both give them what they need (bonded or unbonded) and somehow communicate that fact to them.
There are, indeed, other options for running link bonding. One is that you can run bonds from a vswitch to the TOR, either LACP on a L2 link or ECMP+BFD on an L3 link like VXLAN - the latter is better by far, incidentally, as links come down in milliseconds and the timing for LACP is much slower. That just works - networking-vpp will do it just fine - but if you're using SRIOV and you're using it for reasons of performance you may not want to go anywhere near a vswitch, no matter how fast.
There's also a model for using ECMP directly to the VMs, which doesn't require the SRIOV ports to be bonded at L2 and doesn't necessarily involve Neutron's help to work. That might be another avenue to explore, but you're into grown-up networking at that point.
--
Ian.