[openstack-dev] [Nova] Automatic evacuate

Steve Gordon sgordon at redhat.com
Thu Oct 16 14:31:26 UTC 2014



----- Original Message -----
> From: "Florian Haas" <florian at hastexo.com>
> To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
> 
> On Thu, Oct 16, 2014 at 1:59 PM, Russell Bryant <rbryant at redhat.com> wrote:
> > On 10/16/2014 04:29 AM, Florian Haas wrote:
> >>>>>> (5) Let monitoring and orchestration services deal with these use
> >>>>>> cases and
> >>>>>> have Nova simply provide the primitive API calls that it already does
> >>>>>> (i.e.
> >>>>>> host evacuate).
> >>>>>
> >>>>> That would arguably lead to an incredible amount of wheel reinvention
> >>>>> for node failure detection, service failure detection, etc. etc.
> >>>>
> >>>> How so? (5) would use existing wheels for monitoring and orchestration
> >>>> instead of writing all new code paths inside Nova to do the same thing.
> >>>
> >>> Right, there may be some confusion here ... I thought you were both
> >>> agreeing that the use of an external toolset was a good approach for the
> >>> problem, but Florian's last message makes that not so clear ...
> >>
> >> While one of us (Jay or me) speaking for the other and saying we agree
> >> is a distributed consensus problem that dwarfs the complexity of
> >> Paxos, *I* for my part do think that an "external" toolset (i.e. one
> >> that lives outside the Nova codebase) is the better approach versus
> >> duplicating the functionality of said toolset in Nova.
> >>
> >> I just believe that the toolset that should be used here is
> >> Corosync/Pacemaker and not Ceilometer/Heat. And I believe the former
> >> approach leads to *much* fewer necessary code changes *in* Nova than
> >> the latter.
> >
> > Have you tried pacemaker_remote yet?  It seems like a better choice for
> > this particular case, as opposed to using corosync, due to the potential
> > number of compute nodes.
> 
> I'll assume that you are *not* referring to running Corosync/Pacemaker
> on the compute nodes plus pacemaker_remote in the VMs, because doing
> so would blow up the separation between the cloud operator and tenant
> space.
> 
> Running compute nodes as baremetal extensions of a different
> Corosync/Pacemaker cluster (presumably the one that manages the other
> Nova services)  would potentially be an option, although vendors would
> need to buy into this. Ubuntu, for example, currently only ships
> pacemaker-remote in universe.

This is something we'd be doing *too* OpenStack rather than *in* the OpenStack projects (at least those that deliver code), in fact that's a large part of the appeal. As such I don't know that there necessarily has to be one true solution to rule them all, a distribution could deviate as needed, but we would have some - ideally very small - number of "known good" configurations which achieve the stated goal and are well documented.

Thanks,

Steve



More information about the OpenStack-dev mailing list