[openstack-dev] [Nova] Automatic evacuate

Russell Bryant rbryant at redhat.com
Thu Oct 16 03:01:37 UTC 2014


On 10/15/2014 06:30 PM, Jay Pipes wrote:
> 
> 
> On 10/15/2014 04:50 PM, Florian Haas wrote:
>> On Wed, Oct 15, 2014 at 9:58 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>>> On 10/15/2014 03:16 PM, Florian Haas wrote:
>>>>
>>>> On Wed, Oct 15, 2014 at 7:20 PM, Russell Bryant <rbryant at redhat.com>
>>>> wrote:
>>>>>
>>>>> On 10/13/2014 05:59 PM, Russell Bryant wrote:
>>>>>>
>>>>>> Nice timing.  I was working on a blog post on this topic.
>>>>>
>>>>>
>>>>> which is now here:
>>>>>
>>>>> http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
>>>>>
>>>>
>>>>
>>>> I am absolutely loving the fact that we are finally having a
>>>> discussion in earnest about this. i think this deserves a Design
>>>> Summit session.
>>>>
>>>> If I may weigh in here, let me share what I've seen users do and what
>>>> can currently be done, and what may be supported in the future.
>>>>
>>>> Problem: automatically ensure that a Nova guest continues to run, even
>>>> if its host fails.
>>>>
>>>> (That's the general problem description and I don't need to go into
>>>> further details explaining the problem, because Russell has done that
>>>> beautifully in his blog post.)
>>>>
>>>> Now, what are the options?
>>>>
>>>> (1) Punt and leave it to the hypervisor.
>>>>
>>>> This essentially means that you must use a hypervisor that already has
>>>> HA built in, such as VMware with the VCenter driver. In that scenario,
>>>> Nova itself neither deals with HA, nor exposes any HA switches to the
>>>> user. Obvious downside: not generic, doesn't work with all
>>>> hypervisors, most importantly doesn't work with the most popular one
>>>> (libvirt/KVM).
>>>>
>>>> (2) Deploy Nova nodes in pairs/groups, and pretend that they are one
>>>> node.
>>>>
>>>> You can already do that by overriding "host" in nova-compute.conf,
>>>> setting resume_guests_state_on_host_boot, and using VIPs with
>>>> Corosync/Pacemaker. You can then group these hosts in host aggregates,
>>>> and the user's scheduler hint to point a newly scheduled guest to such
>>>> a host aggregate becomes, effectively, the "keep this guest running at
>>>> all times" flag. Upside: no changes to Nova at all, monitoring,
>>>> fencing and recovery for free from Corosync/Pacemaker. Downsides:
>>>> requires vendors to automate Pacemaker configuration in deployment
>>>> tools (because you really don't want to do those things manually).
>>>> Additional downside: you either have some idle hardware, or you might
>>>> be overcommitting resources in case of failover.
>>>>
>>>> (3) Automatic host evacuation.
>>>>
>>>> Not supported in Nova right now, as Adam pointed out at the top of the
>>>> thread, and repeatedly shot down. If someone were to implement this,
>>>> it would *still* require that Corosync/Pacemaker be used for
>>>> monitoring and fencing of nodes, because re-implementing this from
>>>> scratch would be the reinvention of a wheel while painting a bikeshed.
>>>>
>>>> (4) Per-guest HA.
>>>>
>>>> This is the idea of just doing "nova boot --keep-this running", i.e.
>>>> setting a per-guest flag that still means the machine is to be kept up
>>>> at all times. Again, not supported in Nova right now, and probably
>>>> even more complex to implement generically than (3), at the same or
>>>> greater cost.
>>>>
>>>> I have a suggestion to tackle this that I *think* is reasonably
>>>> user-friendly while still bearable in terms of Nova development
>>>> effort:
>>>>
>>>> (a) Define a well-known metadata key for a host aggregate, say "ha".
>>>> Define that any host aggregate that represents a highly available
>>>> group of compute nodes should have this metadata key set.
>>>>
>>>> (b) Then define a flavor that sets extra_specs "ha=true".
>>>>
>>>> Granted, this places an additional burden on distro vendors to
>>>> integrate highly-available compute nodes into their deployment
>>>> infrastructure. But since practically all of them already include
>>>> Pacemaker, the additional scaffolding required is actually rather
>>>> limited.
>>>
>>>
>>> Or:
>>>
>>> (5) Let monitoring and orchestration services deal with these use
>>> cases and
>>> have Nova simply provide the primitive API calls that it already does
>>> (i.e.
>>> host evacuate).
>>
>> That would arguably lead to an incredible amount of wheel reinvention
>> for node failure detection, service failure detection, etc. etc.
> 
> How so? (5) would use existing wheels for monitoring and orchestration
> instead of writing all new code paths inside Nova to do the same thing.

Right, there may be some confusion here ... I thought you were both
agreeing that the use of an external toolset was a good approach for the
problem, but Florian's last message makes that not so clear ...

-- 
Russell Bryant



More information about the OpenStack-dev mailing list