[openstack-dev] [Nova] Automatic evacuate

Florian Haas florian at hastexo.com
Wed Oct 15 20:50:29 UTC 2014


On Wed, Oct 15, 2014 at 9:58 PM, Jay Pipes <jaypipes at gmail.com> wrote:
> On 10/15/2014 03:16 PM, Florian Haas wrote:
>>
>> On Wed, Oct 15, 2014 at 7:20 PM, Russell Bryant <rbryant at redhat.com>
>> wrote:
>>>
>>> On 10/13/2014 05:59 PM, Russell Bryant wrote:
>>>>
>>>> Nice timing.  I was working on a blog post on this topic.
>>>
>>>
>>> which is now here:
>>>
>>> http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
>>
>>
>> I am absolutely loving the fact that we are finally having a
>> discussion in earnest about this. i think this deserves a Design
>> Summit session.
>>
>> If I may weigh in here, let me share what I've seen users do and what
>> can currently be done, and what may be supported in the future.
>>
>> Problem: automatically ensure that a Nova guest continues to run, even
>> if its host fails.
>>
>> (That's the general problem description and I don't need to go into
>> further details explaining the problem, because Russell has done that
>> beautifully in his blog post.)
>>
>> Now, what are the options?
>>
>> (1) Punt and leave it to the hypervisor.
>>
>> This essentially means that you must use a hypervisor that already has
>> HA built in, such as VMware with the VCenter driver. In that scenario,
>> Nova itself neither deals with HA, nor exposes any HA switches to the
>> user. Obvious downside: not generic, doesn't work with all
>> hypervisors, most importantly doesn't work with the most popular one
>> (libvirt/KVM).
>>
>> (2) Deploy Nova nodes in pairs/groups, and pretend that they are one node.
>>
>> You can already do that by overriding "host" in nova-compute.conf,
>> setting resume_guests_state_on_host_boot, and using VIPs with
>> Corosync/Pacemaker. You can then group these hosts in host aggregates,
>> and the user's scheduler hint to point a newly scheduled guest to such
>> a host aggregate becomes, effectively, the "keep this guest running at
>> all times" flag. Upside: no changes to Nova at all, monitoring,
>> fencing and recovery for free from Corosync/Pacemaker. Downsides:
>> requires vendors to automate Pacemaker configuration in deployment
>> tools (because you really don't want to do those things manually).
>> Additional downside: you either have some idle hardware, or you might
>> be overcommitting resources in case of failover.
>>
>> (3) Automatic host evacuation.
>>
>> Not supported in Nova right now, as Adam pointed out at the top of the
>> thread, and repeatedly shot down. If someone were to implement this,
>> it would *still* require that Corosync/Pacemaker be used for
>> monitoring and fencing of nodes, because re-implementing this from
>> scratch would be the reinvention of a wheel while painting a bikeshed.
>>
>> (4) Per-guest HA.
>>
>> This is the idea of just doing "nova boot --keep-this running", i.e.
>> setting a per-guest flag that still means the machine is to be kept up
>> at all times. Again, not supported in Nova right now, and probably
>> even more complex to implement generically than (3), at the same or
>> greater cost.
>>
>> I have a suggestion to tackle this that I *think* is reasonably
>> user-friendly while still bearable in terms of Nova development
>> effort:
>>
>> (a) Define a well-known metadata key for a host aggregate, say "ha".
>> Define that any host aggregate that represents a highly available
>> group of compute nodes should have this metadata key set.
>>
>> (b) Then define a flavor that sets extra_specs "ha=true".
>>
>> Granted, this places an additional burden on distro vendors to
>> integrate highly-available compute nodes into their deployment
>> infrastructure. But since practically all of them already include
>> Pacemaker, the additional scaffolding required is actually rather
>> limited.
>
>
> Or:
>
> (5) Let monitoring and orchestration services deal with these use cases and
> have Nova simply provide the primitive API calls that it already does (i.e.
> host evacuate).

That would arguably lead to an incredible amount of wheel reinvention
for node failure detection, service failure detection, etc. etc.

Florian



More information about the OpenStack-dev mailing list