[openstack-dev] [Nova][Heat] How to reliably detect VM failures?

Zane Bitter zbitter at redhat.com
Wed Mar 19 16:08:30 UTC 2014

On 19/03/14 02:07, Chris Friesen wrote:
> On 03/18/2014 11:18 AM, Zane Bitter wrote:
>> On 18/03/14 12:42, Steven Dake wrote:
>>> You should be able to use the HARestarter resource and functionality to
>>> do healthchecking of a vm.
>> HARestarter is actually pretty problematic, both in a "causes major
>> architectural headaches for Heat and will probably be deprecated very
>> soon" sense and a "may do very unexpected things to your resources"
>> sense. I wouldn't recommend it.
> Could you elaborate?  What unexpected things might it do?  And what are
> the alternatives?

First of all, despite the name, it doesn't just restart but actually 
deletes the server that it's monitoring and recreates an entirely new 
one. It also deletes any resources which directly or indirectly depend 
on the server being monitored and recreates them too.

The alternative is to use Ceilometer alarms and/or some external 
monitoring system and implement recovery yourself, since the strategy 
you want depends on both your application and the type of failure.

Another avenue being explored in Heat is to have a general way of 
bringing a stack back into line with its template:


More information about the OpenStack-dev mailing list