[openstack-dev] [TripleO] Strategy for recovering crashed nodes in the Overcloud?
Howley, Tom
tom.howley at hp.com
Wed Jul 23 10:31:00 UTC 2014
(Resending to properly start new thread.)
Hi,
I'm running a HA overcloud configuration and as far as I'm aware, there is currently no mechanism in place for restarting failed nodes in the cluster. Originally, I had been wondering if we would use a corosync/pacemaker cluster across the control plane with STONITH resources configured for each node (a STONITH plugin for Ironic could be written). This might be fine if a corosync/pacemaker stack is already being used for HA of some components, but it seems overkill otherwise. The undercloud heat could be in a good position to restart the overcloud nodes -- is that the plan or are there other options being considered?
Thanks,
Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140723/93e21017/attachment.html>
More information about the OpenStack-dev
mailing list