[openstack-dev] [Nova] Automatic evacuate

David Vossel dvossel at redhat.com
Tue Oct 21 18:36:27 UTC 2014



----- Original Message -----
> On Thu, Oct 16, 2014 at 7:48 PM, Jay Pipes <jaypipes at gmail.com> wrote:
> >> While one of us (Jay or me) speaking for the other and saying we agree
> >> is a distributed consensus problem that dwarfs the complexity of
> >> Paxos
> >
> >
> > You've always had a way with words, Florian :)
> 
> I knew you'd like that one. :)
> 
> >>, *I* for my part do think that an "external" toolset (i.e. one
> >>
> >> that lives outside the Nova codebase) is the better approach versus
> >> duplicating the functionality of said toolset in Nova.
> >>
> >> I just believe that the toolset that should be used here is
> >> Corosync/Pacemaker and not Ceilometer/Heat. And I believe the former
> >> approach leads to *much* fewer necessary code changes *in* Nova than
> >> the latter.
> >
> >
> > I agree with you that Corosync/Pacemaker is the tool of choice for
> > monitoring/heartbeat functionality, and is my choice for compute-node-level
> > HA monitoring. For guest-level HA monitoring, I would say use
> > Heat/Ceilometer. For container-level HA monitoring, it looks like fleet or
> > something like Kubernetes would be a good option.
> 
> Here's why I think that's a bad idea: none of these support the
> concept of being subordinate to another cluster.
> 
> Again, suppose a VM stops responding. Then
> Heat/Ceilometer/Kubernetes/fleet would need to know whether the node
> hosting the VM is down or not. Only if the node is up or recovered
> (which Pacemaker would be reponsible for) the VM HA facility would be
> able to kick in. Effectively you have two views of the cluster
> membership, and that sort of thing always gets messy. In the HA space
> we're always facing the same issues when a replication facility
> (Galera, GlusterFS, DRBD, whatever) has a different view of the
> cluster membership than the cluster manager itself — which *always*
> happens for a few seconds on any failover, recovery, or fencing event.
> 
> Russell's suggestion, by having remote Pacemaker instances on the
> compute nodes tie in with a Pacemaker cluster on the control nodes,
> does away with that discrepancy.
> 
> > I'm curious to see how the combination of compute-node-level HA and
> > container-level HA tools will work together in some of the proposed
> > deployment architectures (bare metal + docker containers w/ OpenStack and
> > infrastructure services run in a Kubernetes pod or CoreOS fleet).
> 
> I have absolutely nothing against an OpenStack cluster using
> *exclusively* Kubernetes or fleet for HA management, once those have
> reached sufficient maturity.

It's not about reaching sufficient maturity for these two projects. They are
on the wrong path to achieve proper HA. Kubernetes and fleet (i'll throw geard
into the mix as well) do a great job at distributed management of containers.
The  difference is instead of integrating with a proper HA stack (like Nova is)
kubernetes and fleet are attempting their own HA. In doing this, they've
unknowingly blown the scope of their respective projects way beyond what they
originally set out to do.

Here's the problem. HA is both very misunderstood and deceivingly difficult to
achieve. System wide deterministic failover behavior is not a matter of monitoring
and restarting failed containers. For kubernetes and fleet to succeed, they will
need to integrate with a proper HA stack like pacemaker.

Below are some presentation slides on how I envision pacemaker interacting with
container orchestration tools.

https://github.com/davidvossel/phd/blob/master/doc/presentations/HA_Container_Overview_David_Vossel.pdf?raw=true

-- Vossel

> But just about every significant
> OpenStack distro out there has settled on Corosync/Pacemaker for the
> time being. Let's not shove another cluster manager down their throats
> for little to no real benefit.
> 
> Cheers,
> Florian
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list