[neutron] How to fix break of connectivity in case of L3 HA after reboot of backup node

Rodolfo Alonso ralonsoh at redhat.com
Fri Mar 20 19:13:04 UTC 2020


Hello:

As commented by Nate and Brian, and myself in [2] and [3], I prefer [2]. I understand this is a fix
only for OVS, but:
- It limits the solution to the external GW port plugging process, where the problem appears.
- The second solution, as you commented, can introduce a race condition between the L3 agent and
keepalived process, and a possible delay in the HA switch process.

Regards.


On Fri, 2020-03-20 at 14:40 -0400, Brian Haley wrote:
> On 3/20/20 11:57 AM, Nate Johnston wrote:
> > On Fri, Mar 20, 2020 at 03:37:49PM +0100, Slawek Kaplonski wrote:
> > > Hi,
> > > 
> > > We have bug [1] to solve. Basically, when node which is backup node for some
> > > router, connectivity to external gateway may be broken for some time. It's like
> > > that because when host is up and L3 agent is configuring qrouter namespace, it
> > > flush all IPv6 addresses from the qg- interface. And due to that some MLDv2
> > > packets are sent from this interface which may cause that ToR switch will learn
> > > qg port's mac address on wrong (backup) node.
> > > 
> > > This don't happens every time and for every router because it is a race between
> > > L3 agent and OVS agent. When L3 agent creates qg interface in br-int, it sets
> > > tag 4095 for it and traffic sent with such vlan tag is always dropped in br-int.
> > > So if L3 agent will flush IPv6 addresses before OVS agent wires the port and
> > > sets correct tag for it, then all is fine. But if OVS agent is first, then MLDv2
> > > packets are sent to the wire and there is this connectivity break.
> > > 
> > > There are proposed 2 ways of fixing this:
> > >   - [2] which propsoes to add some kind of "communication" between L3 agent and
> > >     OVS agent and tell OVS agent that tag can be changed only after IPv6 config
> > >     is finished by L3 agent.
> > >     Downside of this solution is that it works for OVS agent only, Linuxbridge
> > >     agent may still hit the same issue. But plus is that after initial
> > >     configuration of the router, everything else regarding to failover is handled
> > >     by keepalived only - in same way like it is now.
> > >   - [3] which sets qg NIC to be DOWN always on backup nodes. So when keepalived
> > >     failover active router to new node, L3 agent needs to come and switch
> > >     interfaces to be UP before it will work.
> > >     The plus of this solution is that it works for all OVS and
> > >     Linuxbridge L2 agents (and probably for others too) but downside is that
> > >     failover process is a bit longer and there may be potentially another race
> > >     condition between L3 agent and keepalived. Keepalived tries to sent gARP
> > >     packets after switch node to be active, first attempt will always fail as
> > >     interface is still DOWN. But keepalived will retry those gARPs after some
> > >     time and this should be fine if L3 agent will already bring interface to be
> > >     UP.
> > 
> > Personally I find [2] more appealing.  I think that if we find many linuxbridge
> > users hitting this issue then we can replicate the solution for linuxbridge at
> > that time, but until then let's not worry about it - the majority of users use
> > OVS.  And the gARP timegap for solution #3 to me seems like a possbility for
> > problems or downtime.
> 
> I would agree, it seemed easier to understand to me as well.
> 
> -Brian
> 
> > > Both patches are waiting for pretty long time in gerrit and I want to bring more
> > > visibility for both of them. Please check them and maybe You will have some
> > > opinions about which solution would be better and which we should go with.
> > > 
> > > [1] https://bugs.launchpad.net/neutron/+bug/1859832
> > > [2] https://review.opendev.org/#/c/702856/
> > > [3] https://review.opendev.org/#/c/707406/
> > > 
> > > -- 
> > > Slawek Kaplonski
> > > Senior software engineer
> > > Red Hat
> > > 
> > > 




More information about the openstack-discuss mailing list