[openstack-dev] [TripleO] Strategy for recovering crashed nodes in the Overcloud?
Charles Crouch
ccrouch at redhat.com
Tue Jul 22 17:14:29 UTC 2014
----- Original Message -----
> Hi,
>
> I'm running a HA overcloud configuration and as far as I'm aware, there is
> currently no mechanism in place for restarting failed nodes in the cluster.
> Originally, I had been wondering if we would use a corosync/pacemaker
> cluster across the control plane with STONITH resources configured for each
> node (a STONITH plugin for Ironic could be written).
I know some people are starting to look at how to use pacemaker for fencing/
recovery with TripleO, but I'm not aware of any proposals yet.
I'm sure as soon as that is published it will hit this list.
>This might be fine if a
> corosync/pacemaker stack is already being used for HA of some components,
> but it seems overkill otherwise.
There is a pending patch to add support for using pacemaker to deal with A/P
services: e.g. https://review.openstack.org/#/c/105397/
I'd expect additional patches like this in the future.
>The undercloud heat could be in a good
> position to restart the overcloud nodes -- is that the plan or are there
> other options being considered?
>
> Thanks,
> Tom
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list