[openstack-dev] [Nova] Automatic evacuate

Jay Pipes jaypipes at gmail.com
Wed Oct 15 22:30:17 UTC 2014



On 10/15/2014 04:50 PM, Florian Haas wrote:
> On Wed, Oct 15, 2014 at 9:58 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>> On 10/15/2014 03:16 PM, Florian Haas wrote:
>>>
>>> On Wed, Oct 15, 2014 at 7:20 PM, Russell Bryant <rbryant at redhat.com>
>>> wrote:
>>>>
>>>> On 10/13/2014 05:59 PM, Russell Bryant wrote:
>>>>>
>>>>> Nice timing.  I was working on a blog post on this topic.
>>>>
>>>>
>>>> which is now here:
>>>>
>>>> http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
>>>
>>>
>>> I am absolutely loving the fact that we are finally having a
>>> discussion in earnest about this. i think this deserves a Design
>>> Summit session.
>>>
>>> If I may weigh in here, let me share what I've seen users do and what
>>> can currently be done, and what may be supported in the future.
>>>
>>> Problem: automatically ensure that a Nova guest continues to run, even
>>> if its host fails.
>>>
>>> (That's the general problem description and I don't need to go into
>>> further details explaining the problem, because Russell has done that
>>> beautifully in his blog post.)
>>>
>>> Now, what are the options?
>>>
>>> (1) Punt and leave it to the hypervisor.
>>>
>>> This essentially means that you must use a hypervisor that already has
>>> HA built in, such as VMware with the VCenter driver. In that scenario,
>>> Nova itself neither deals with HA, nor exposes any HA switches to the
>>> user. Obvious downside: not generic, doesn't work with all
>>> hypervisors, most importantly doesn't work with the most popular one
>>> (libvirt/KVM).
>>>
>>> (2) Deploy Nova nodes in pairs/groups, and pretend that they are one node.
>>>
>>> You can already do that by overriding "host" in nova-compute.conf,
>>> setting resume_guests_state_on_host_boot, and using VIPs with
>>> Corosync/Pacemaker. You can then group these hosts in host aggregates,
>>> and the user's scheduler hint to point a newly scheduled guest to such
>>> a host aggregate becomes, effectively, the "keep this guest running at
>>> all times" flag. Upside: no changes to Nova at all, monitoring,
>>> fencing and recovery for free from Corosync/Pacemaker. Downsides:
>>> requires vendors to automate Pacemaker configuration in deployment
>>> tools (because you really don't want to do those things manually).
>>> Additional downside: you either have some idle hardware, or you might
>>> be overcommitting resources in case of failover.
>>>
>>> (3) Automatic host evacuation.
>>>
>>> Not supported in Nova right now, as Adam pointed out at the top of the
>>> thread, and repeatedly shot down. If someone were to implement this,
>>> it would *still* require that Corosync/Pacemaker be used for
>>> monitoring and fencing of nodes, because re-implementing this from
>>> scratch would be the reinvention of a wheel while painting a bikeshed.
>>>
>>> (4) Per-guest HA.
>>>
>>> This is the idea of just doing "nova boot --keep-this running", i.e.
>>> setting a per-guest flag that still means the machine is to be kept up
>>> at all times. Again, not supported in Nova right now, and probably
>>> even more complex to implement generically than (3), at the same or
>>> greater cost.
>>>
>>> I have a suggestion to tackle this that I *think* is reasonably
>>> user-friendly while still bearable in terms of Nova development
>>> effort:
>>>
>>> (a) Define a well-known metadata key for a host aggregate, say "ha".
>>> Define that any host aggregate that represents a highly available
>>> group of compute nodes should have this metadata key set.
>>>
>>> (b) Then define a flavor that sets extra_specs "ha=true".
>>>
>>> Granted, this places an additional burden on distro vendors to
>>> integrate highly-available compute nodes into their deployment
>>> infrastructure. But since practically all of them already include
>>> Pacemaker, the additional scaffolding required is actually rather
>>> limited.
>>
>>
>> Or:
>>
>> (5) Let monitoring and orchestration services deal with these use cases and
>> have Nova simply provide the primitive API calls that it already does (i.e.
>> host evacuate).
>
> That would arguably lead to an incredible amount of wheel reinvention
> for node failure detection, service failure detection, etc. etc.

How so? (5) would use existing wheels for monitoring and orchestration 
instead of writing all new code paths inside Nova to do the same thing.

Best,
-jay



More information about the OpenStack-dev mailing list