[openstack-dev] blueprint proposal nova-compute fencing for HA ?

Russell Bryant rbryant at redhat.com
Tue Apr 23 00:43:10 UTC 2013


On 04/22/2013 08:01 PM, Leen Besselink wrote:
> On Mon, Apr 22, 2013 at 07:05:18PM -0400, Russell Bryant wrote:
>> On 04/22/2013 06:32 PM, Leen Besselink wrote:
>>> On Mon, Apr 22, 2013 at 01:02:45PM -0400, Lon Hohberger wrote:
>>>> On 04/22/2013 08:09 AM, Leen Besselink wrote:
>>>>> Hi,
>>>>>
>>>>> As I have not been at the summit and the technical videos of the Summit are not yet online I am not aware of what was discusses there.
>>>>>
>>>>> But I would like to submit a blueprint.
>>>>>
>>>>> My idea is:
>>>>>
>>>>> It is a step to support VM High availability.
>>>>>
>>>>> This part is about handling compute node failure.
>>>>>
>>>>> My proposal would be to create a framework/API/plugin/agent or whatever is needed for fencing off a nova-compute node.
>>>>
>>>>> The implementation of the code that handles the fencing could be implemented in different ways for different environments:
>>>>>
>>>>> - The IPMI-code that handle baremetal provisining could for example be used to poweroff or reboot the node.
>>>>
>>>> Hi,
>>>>
>>>> This sounds familiar :)  These have been integrated in to several
>>>> projects, including Pacemaker and oVirt:
>>>>
>>>> https://git.fedorahosted.org/git/fence-agents.git
>>>>
>>>
>>> Ahh, of course, that is a good idea. Thanks.
>>>
>>> It is also packaged in Debian/Ubuntu.
>>>
>>> I even had it checked out on my desktop, so I had seen it before. And I should have
>>> known better. :-)
>>>
>>> So where would the code that calls such fence-agents best fit into OpenStack ?
>>>
>>> Or maybe this is an other new service in OpenStack (like there aren't enough already) ?
>>>
>>> I guess it would run on a machine where you would also find something like the Nova
>>> baremetal deploy helper service.
>>
>> It seems like something that doesn't belong in OpenStack.  Take a look
>> at Pacemaker.  It does quite a bit of this already.  You can have it
>> monitoring a compute node, detect failure, and react accordingly,
>> including the fencing part.
>>
> 
> I know it does that, but how far can you scale that ?
> 
> 16 nodes ?

IIRC, the underlying messaging infrastructure (Corosync) supports 64
nodes.  Don't quote me on that, though.  :-)

> What do you if you have multiple nodes fail at ones in that cluster ?
> 
> You might not have enough capacity in the cluster to run the instances,
> but you might have enough capacity on machines outside of that cluster.
> 
> So how do you deal with things like that ?

I was only talking about the fencing off a compute node part, since
that's what you started the thread with.  :-)

Presumably you would still use nova APIs that already exist to move the
instances elsewhere.  An 'evacuate' API went in to grizzly for this.

https://blueprints.launchpad.net/nova/+spec/rebuild-for-ha

-- 
Russell Bryant



More information about the OpenStack-dev mailing list