[openstack-dev] [Neutron] Alternative approaches for L3 HA

Adam Spiers aspiers at suse.com
Fri Feb 24 04:40:28 UTC 2017

Anil Venkata <anilvenkata at redhat.com> wrote:
> On Thu, Feb 23, 2017 at 12:10 AM, Miguel Angel Ajo Pelayo <majopela at redhat.com> wrote:
> > On Wed, Feb 22, 2017 at 11:28 AM, Adam Spiers <aspiers at suse.com> wrote:
> >> With help from others, I have started an analysis of some of the
> >> different approaches to L3 HA:
> >>
> >>     https://ethercalc.openstack.org/Pike-Neutron-L3-HA
> >>
> >> (although I take responsibility for all mistakes ;-)
> Did you test with this patch https://review.openstack.org/#/c/255237/  ? It
> was merged in newton cycle.
> With this patch, HA+L2pop doesn't depend on control plane during fail over,
> hence failover should be faster(same as without l2pop).

Thanks Anil!  I've updated the spreadsheet to take this into account.

> >> It would be great if someone from RH or RDO could provide information
> >> on how this RDO (and/or RH OSP?) solution based on Pacemaker +
> >> keepalived works - if so, I volunteer to:
> >>
> >>   - help populate column E of the above sheet so that we can
> >>     understand if there are still remaining gaps in the solution, and
> >>
> >>   - document it (e.g. in the HA guide).  Even if this only ended up
> >>     being considered as a shorter-term solution, I think it's still
> >>     worth documenting so that it's another option available to
> >>     everyone.
> >>
> >> Thanks!
> > I have updated the spreadsheet.

Thanks a lot Miguel and everyone else who contributed to the
spreadsheet so far!

After a very productive meeting this morning at the PTG, I think it is
quite close to completion now, and I am already working with the docs
team on moving it into official documentation, either in the HA Guide
(which I am trying to help maintain) or the Networking Guide.  I don't
have strong opinions on where it should live - if anyone does then
please let us know now.

I also attempted to write up a mini-report summarising this morning's
meeting for future reference; it's (currently) at line 279 onwards of:


but I'll reproduce it here for convenience.

The conclusion, at least as I understand it, is as follows:

- The l3_ha solution is already working pretty well in many
  deployments, especially when coupled with a few extra benefits from
  Pacemaker (although
  https://bugs.launchpad.net/neutron/+bugs?field.tag=l3-ha might
  suggest otherwise ...)

- Some more refinements to this solution could be made to reduce the
  remaining corner cases where failures are not handled well.

- I (and hopefully others) will work towards documenting this solution
  in more detail.

- In the mean time, Ann Taraday and anyone else interested may
  continue out-of-tree experiments with different architectures such
  as tooz/etcd.  It is expected that these would be invasive changes,
  possibly taking at least 1-2 release cycles to stabilise, but they
  might still be worth it.

- If a PoC is submitted for review and looks promising, we can decide
  whether it makes sense to aim to replace the existing keepalived
  solution, or instead offer it as an alternative by introducing
  pluggable L3 drivers.  However, adding a driver abstraction layer
  would also be costly and expand the test matrix, at a time where
  developer resources are scarce. So there would need to be a
  compelling reason to do this.

I hope that's a reasonably accurate representation of the outcome from
this morning - obviously feel free to submit comments if I missed or
mistook anything.  Thanks for a great meeting!

More information about the OpenStack-dev mailing list