Open Stack

Thu Oct 16 13:00:11 UTC 2014

On Thu, Oct 16, 2014 at 1:59 PM, Russell Bryant <rbryant at redhat.com> wrote:
> On 10/16/2014 04:29 AM, Florian Haas wrote:
>>>>>> (5) Let monitoring and orchestration services deal with these use
>>>>>> cases and
>>>>>> have Nova simply provide the primitive API calls that it already does
>>>>>> (i.e.
>>>>>> host evacuate).
>>>>>
>>>>> That would arguably lead to an incredible amount of wheel reinvention
>>>>> for node failure detection, service failure detection, etc. etc.
>>>>
>>>> How so? (5) would use existing wheels for monitoring and orchestration
>>>> instead of writing all new code paths inside Nova to do the same thing.
>>>
>>> Right, there may be some confusion here ... I thought you were both
>>> agreeing that the use of an external toolset was a good approach for the
>>> problem, but Florian's last message makes that not so clear ...
>>
>> While one of us (Jay or me) speaking for the other and saying we agree
>> is a distributed consensus problem that dwarfs the complexity of
>> Paxos, *I* for my part do think that an "external" toolset (i.e. one
>> that lives outside the Nova codebase) is the better approach versus
>> duplicating the functionality of said toolset in Nova.
>>
>> I just believe that the toolset that should be used here is
>> Corosync/Pacemaker and not Ceilometer/Heat. And I believe the former
>> approach leads to *much* fewer necessary code changes *in* Nova than
>> the latter.
>
> Have you tried pacemaker_remote yet?  It seems like a better choice for
> this particular case, as opposed to using corosync, due to the potential
> number of compute nodes.

I'll assume that you are *not* referring to running Corosync/Pacemaker
on the compute nodes plus pacemaker_remote in the VMs, because doing
so would blow up the separation between the cloud operator and tenant
space.

Running compute nodes as baremetal extensions of a different
Corosync/Pacemaker cluster (presumably the one that manages the other
Nova services)  would potentially be an option, although vendors would
need to buy into this. Ubuntu, for example, currently only ships
pacemaker-remote in universe.

*If* you're running pacemaker_remote on the compute node, though, that
then also opens up the possibility for a compute driver to just dump
the libvirt definition into a VirtualDomain Pacemaker resource,
meaning with a small callout added to Nova, you could also get the
virtual machine monitoring functionality. Bonus: this could eventually
be extended to allow live migration of guests to other compute nodes
in the same cluster, in case you want to shut down a compute node for
maintenance without interrupting your HA guests.

Cheers,
Florian

Open Stack

[openstack-dev] [Nova] Automatic evacuate

OpenStack

Community

Documentation

Branding & Legal