[openstack-dev] [Neutron] Alternative approaches for L3 HA

Anna Taraday akamyshnikova at mirantis.com
Fri Feb 10 17:27:05 UTC 2017

Hello everyone!

In Juno in Neutron was implemented L3 HA feature based on Keepalived
(VRRP). During next cycles it was improved, we performed scale testing [1]
to find weak places and tried to fix them. The only alternative for L3 HA
with VRRP is router rescheduling performed by Neutron server, but it is
significantly slower and depends on control plane.

What issues we experienced with L3 HA VRRP?

   1. Bugs in Keepalived (bad versions) [2]
   2. Split brain [3]
   3. Complex structure (ha networks, ha interfaces) - which actually cause
   races that we were fixing during Liberty, Mitaka and Newton.

This all is not critical, but this is a bad experience and not everyone
ready (or want) to use Keepalived approach.

I think we can make things more flexible. For example, we can allow user to
use external services like etcd instead of Keepalived to synchronize
current HA state across agents. I've done several experiments and I've got
failover time comparable to L3 HA with VRRP. Tooz [4] can be used to
abstract from concrete backend. For example, it can allow us to use
Zookeeper, Redis and other backends to store HA state.

What I want to propose?

I want to bring up idea that Neutron should have some general classes for
L3 HA which will allow to use not only Keepalived but also other backends
for HA state. This at least will make it easier to try some other
approaches and compare them with existing ones.

Does this sound reasonable?

[1] -
[2] - https://bugs.launchpad.net/neutron/+bug/1497272
[3] - https://bugs.launchpad.net/neutron/+bug/1375625
[4] - http://docs.openstack.org/developer/tooz/

Ann Taraday
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170210/fb287815/attachment.html>

More information about the OpenStack-dev mailing list