[Openstack-operators] What to do when a compute node dies?

Carlos Goncalves Carlos.Goncalves at neclab.eu
Mon Mar 30 09:53:12 UTC 2015


Hi Mike,

While not directly answering your question, allow me to share with you the OPNFV Doctor (https://wiki.opnfv.org/doctor), a fault management and maintenance project that extends and uses OpenStack.

The team should deliver today the final document for the two weeks project-wide review. In the meantime you can check the latest available draft here: http://lists.opnfv.org/pipermail/opnfv-tech-discuss/2015-March/001629.html.

Feedback is welcome!

Thanks,
Carlos

Carlos Gonçalves | NEC Europe Ltd. | Kurfürsten-Anlage 36 | 69115 Heidelberg | Germany | +49 6221 4342-217
NEC Europe Ltd | Registered Office: Athene, Odyssey Business Park, West End  Road, London, HA4 6QE, GB | Registered in England 2832014

From: Mike Dorman [mailto:mdorman at godaddy.com]
Sent: 30 March 2015 05:26
To: OpenStack Operators
Subject: [Openstack-operators] What to do when a compute node dies?

Hi all,

I’m curious about how people deal with failures of compute nodes, as in total failure when the box is gone for good.  (Mainly care about KVM HV, but also interested in more general cases as well.)

The particular situation we’re looking at: how end users could identify or be notified of VMs that no longer exist, because their hypervisor is dead.  As I understand it, Nova will still believe VMs are running, and really has no way to know anything has changed (other than the nova-compute instance has dropped off.)

I understand failure detection is a tricky thing.  But it seems like there must be something a little better than this.

Thanks,
Mike

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150330/370d7976/attachment.html>


More information about the OpenStack-operators mailing list