[openstack-dev] [TripleO] Strategy for recovering crashed nodes in the Overcloud?
    Charles Crouch 
    ccrouch at redhat.com
       
    Tue Jul 22 17:14:29 UTC 2014
    
    
  
----- Original Message -----
> Hi,
> 
> I'm running a HA overcloud configuration and as far as I'm aware, there is
> currently no mechanism in place for restarting failed nodes in the cluster.
> Originally, I had been wondering if we would use a corosync/pacemaker
> cluster across the control plane with STONITH resources configured for each
> node (a STONITH plugin for Ironic could be written). 
I know some people are starting to look at how to use pacemaker for fencing/
recovery with TripleO, but I'm not aware of any proposals yet. 
I'm sure as soon as that is published it will hit this list.
>This might be fine if a
> corosync/pacemaker stack is already being used for HA of some components,
> but it seems overkill otherwise. 
There is a pending patch to add support for using pacemaker to deal with A/P
services: e.g. https://review.openstack.org/#/c/105397/
I'd expect additional patches like this in the future.
>The undercloud heat could be in a good
> position to restart the overcloud nodes -- is that the plan or are there
> other options being considered?
> 
> Thanks,
> Tom
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
    
    
More information about the OpenStack-dev
mailing list