[openstack-dev] blueprint proposal nova-compute fencing for HA ?
ubuntu at consolejunkie.net
Tue Apr 23 00:01:05 UTC 2013
On Mon, Apr 22, 2013 at 07:05:18PM -0400, Russell Bryant wrote:
> On 04/22/2013 06:32 PM, Leen Besselink wrote:
> > On Mon, Apr 22, 2013 at 01:02:45PM -0400, Lon Hohberger wrote:
> >> On 04/22/2013 08:09 AM, Leen Besselink wrote:
> >>> Hi,
> >>> As I have not been at the summit and the technical videos of the Summit are not yet online I am not aware of what was discusses there.
> >>> But I would like to submit a blueprint.
> >>> My idea is:
> >>> It is a step to support VM High availability.
> >>> This part is about handling compute node failure.
> >>> My proposal would be to create a framework/API/plugin/agent or whatever is needed for fencing off a nova-compute node.
> >>> The implementation of the code that handles the fencing could be implemented in different ways for different environments:
> >>> - The IPMI-code that handle baremetal provisining could for example be used to poweroff or reboot the node.
> >> Hi,
> >> This sounds familiar :) These have been integrated in to several
> >> projects, including Pacemaker and oVirt:
> >> https://git.fedorahosted.org/git/fence-agents.git
> > Ahh, of course, that is a good idea. Thanks.
> > It is also packaged in Debian/Ubuntu.
> > I even had it checked out on my desktop, so I had seen it before. And I should have
> > known better. :-)
> > So where would the code that calls such fence-agents best fit into OpenStack ?
> > Or maybe this is an other new service in OpenStack (like there aren't enough already) ?
> > I guess it would run on a machine where you would also find something like the Nova
> > baremetal deploy helper service.
> It seems like something that doesn't belong in OpenStack. Take a look
> at Pacemaker. It does quite a bit of this already. You can have it
> monitoring a compute node, detect failure, and react accordingly,
> including the fencing part.
I know it does that, but how far can you scale that ?
16 nodes ?
What do you if you have multiple nodes fail at ones in that cluster ?
You might not have enough capacity in the cluster to run the instances,
but you might have enough capacity on machines outside of that cluster.
So how do you deal with things like that ?
> >> There's a standalone API that they follow which simply takes stdin
> >> parameter=value assignments.
> >> These agents call out to IPMI, iLO, DRAC, RSA, and other integrated
> >> hardware as well as external power switches for controlling host power.
> >> Many of them are written in python (or C) and should require minimal if
> >> anything more than what OpenStack already requires.
> Russell Bryant
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
More information about the OpenStack-dev