[openstack-dev] blueprint proposal nova-compute fencing for HA ?
Russell Bryant
rbryant at redhat.com
Tue Apr 23 00:43:10 UTC 2013
On 04/22/2013 08:01 PM, Leen Besselink wrote:
> On Mon, Apr 22, 2013 at 07:05:18PM -0400, Russell Bryant wrote:
>> On 04/22/2013 06:32 PM, Leen Besselink wrote:
>>> On Mon, Apr 22, 2013 at 01:02:45PM -0400, Lon Hohberger wrote:
>>>> On 04/22/2013 08:09 AM, Leen Besselink wrote:
>>>>> Hi,
>>>>>
>>>>> As I have not been at the summit and the technical videos of the Summit are not yet online I am not aware of what was discusses there.
>>>>>
>>>>> But I would like to submit a blueprint.
>>>>>
>>>>> My idea is:
>>>>>
>>>>> It is a step to support VM High availability.
>>>>>
>>>>> This part is about handling compute node failure.
>>>>>
>>>>> My proposal would be to create a framework/API/plugin/agent or whatever is needed for fencing off a nova-compute node.
>>>>
>>>>> The implementation of the code that handles the fencing could be implemented in different ways for different environments:
>>>>>
>>>>> - The IPMI-code that handle baremetal provisining could for example be used to poweroff or reboot the node.
>>>>
>>>> Hi,
>>>>
>>>> This sounds familiar :) These have been integrated in to several
>>>> projects, including Pacemaker and oVirt:
>>>>
>>>> https://git.fedorahosted.org/git/fence-agents.git
>>>>
>>>
>>> Ahh, of course, that is a good idea. Thanks.
>>>
>>> It is also packaged in Debian/Ubuntu.
>>>
>>> I even had it checked out on my desktop, so I had seen it before. And I should have
>>> known better. :-)
>>>
>>> So where would the code that calls such fence-agents best fit into OpenStack ?
>>>
>>> Or maybe this is an other new service in OpenStack (like there aren't enough already) ?
>>>
>>> I guess it would run on a machine where you would also find something like the Nova
>>> baremetal deploy helper service.
>>
>> It seems like something that doesn't belong in OpenStack. Take a look
>> at Pacemaker. It does quite a bit of this already. You can have it
>> monitoring a compute node, detect failure, and react accordingly,
>> including the fencing part.
>>
>
> I know it does that, but how far can you scale that ?
>
> 16 nodes ?
IIRC, the underlying messaging infrastructure (Corosync) supports 64
nodes. Don't quote me on that, though. :-)
> What do you if you have multiple nodes fail at ones in that cluster ?
>
> You might not have enough capacity in the cluster to run the instances,
> but you might have enough capacity on machines outside of that cluster.
>
> So how do you deal with things like that ?
I was only talking about the fencing off a compute node part, since
that's what you started the thread with. :-)
Presumably you would still use nova APIs that already exist to move the
instances elsewhere. An 'evacuate' API went in to grizzly for this.
https://blueprints.launchpad.net/nova/+spec/rebuild-for-ha
--
Russell Bryant
More information about the OpenStack-dev
mailing list