[openstack-dev] [Neutron][L2Pop][HA Routers] Request for comments for a possible solution

Gary Kotton gkotton at vmware.com
Thu Dec 18 12:47:36 UTC 2014

On 12/18/14, 2:06 PM, "Mike Kolesnik" <mkolesni at redhat.com> wrote:

>Hi Neutron community members.
>I wanted to query the community about a proposal of how to fix HA routers
>working with L2Population (bug 1365476[1]).
>This bug is important to fix especially if we want to have HA routers and
>routers working together.
>[1] https://bugs.launchpad.net/neutron/+bug/1365476
>What's happening now?
>* HA routers use distributed ports, i.e. the port with the same IP & MAC
>  details is applied on all nodes where an L3 agent is hosting this
>* Currently, the port details have a binding pointing to an arbitrary node
>  and this is not updated.
>* L2pop takes this "potentially stale" information and uses it to create:
>  1. A tunnel to the node.
>  2. An FDB entry that directs traffic for that port to that node.
>  3. If ARP responder is on, ARP requests will not traverse the network.
>* Problem is, the master router wouldn't necessarily be running on the
>  reported agent.
>  This means that traffic would not reach the master node but some
>  node where the router master might be running, but might be in another
>  state (standby, fail).
>What is proposed?
>Basically the idea is not to do L2Pop for HA router ports that reside on
>tenant network.
>Instead, we would create a tunnel to each node hosting the HA router so
>the normal learning switch functionality would take care of switching the
>traffic to the master router.

In Neutron we just ensure that the MAC address is unique per network.
Could a duplicate MAC address cause problems here?

>This way no matter where the master router is currently running, the data
>plane would know how to forward traffic to it.
>This solution requires changes on the controller only.
>What's to gain?
>* Data plane only solution, independent of the control plane.
>* Lowest failover time (same as HA routers today).
>* High backport potential:
>  * No APIs changed/added.
>  * No configuration changes.
>  * No DB changes.
>  * Changes localized to a single file and limited in scope.
>What's the alternative?
>An alternative solution would be to have the controller update the port
>on the single port so that the plain old L2Pop happens and notifies about
>location of the master router.
>This basically negates all the benefits of the proposed solution, but is
>This solution depends on the report-ha-router-master spec which is
>currently in
>the implementation phase.
>It's important to note that these two solutions don't collide and could
>be done
>independently. The one I'm proposing just makes more sense from an HA
>because of it's benefits which fit the HA methodology of being fast &
>having as
>little outside dependency as possible.
>It could be done as an initial solution which solves the bug for mechanism
>drivers that support normal learning switch (OVS), and later kept as an
>optimization to the more general, controller based, solution which will
>the issue for any mechanism driver working with L2Pop (Linux Bridge,
>Would love to hear your thoughts on the subject.
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org

More information about the OpenStack-dev mailing list