Hi, We have bug [1] to solve. Basically, when node which is backup node for some router, connectivity to external gateway may be broken for some time. It's like that because when host is up and L3 agent is configuring qrouter namespace, it flush all IPv6 addresses from the qg- interface. And due to that some MLDv2 packets are sent from this interface which may cause that ToR switch will learn qg port's mac address on wrong (backup) node. This don't happens every time and for every router because it is a race between L3 agent and OVS agent. When L3 agent creates qg interface in br-int, it sets tag 4095 for it and traffic sent with such vlan tag is always dropped in br-int. So if L3 agent will flush IPv6 addresses before OVS agent wires the port and sets correct tag for it, then all is fine. But if OVS agent is first, then MLDv2 packets are sent to the wire and there is this connectivity break. There are proposed 2 ways of fixing this: - [2] which propsoes to add some kind of "communication" between L3 agent and OVS agent and tell OVS agent that tag can be changed only after IPv6 config is finished by L3 agent. Downside of this solution is that it works for OVS agent only, Linuxbridge agent may still hit the same issue. But plus is that after initial configuration of the router, everything else regarding to failover is handled by keepalived only - in same way like it is now. - [3] which sets qg NIC to be DOWN always on backup nodes. So when keepalived failover active router to new node, L3 agent needs to come and switch interfaces to be UP before it will work. The plus of this solution is that it works for all OVS and Linuxbridge L2 agents (and probably for others too) but downside is that failover process is a bit longer and there may be potentially another race condition between L3 agent and keepalived. Keepalived tries to sent gARP packets after switch node to be active, first attempt will always fail as interface is still DOWN. But keepalived will retry those gARPs after some time and this should be fine if L3 agent will already bring interface to be UP. Both patches are waiting for pretty long time in gerrit and I want to bring more visibility for both of them. Please check them and maybe You will have some opinions about which solution would be better and which we should go with. [1] https://bugs.launchpad.net/neutron/+bug/1859832 [2] https://review.opendev.org/#/c/702856/ [3] https://review.opendev.org/#/c/707406/ -- Slawek Kaplonski Senior software engineer Red Hat