[openstack-dev] [Nova] Automatic evacuate

Jay Pipes jaypipes at gmail.com
Tue Oct 14 17:01:04 UTC 2014


On 10/13/2014 05:59 PM, Russell Bryant wrote:
> Nice timing.  I was working on a blog post on this topic.
>
> On 10/13/2014 05:40 PM, Fei Long Wang wrote:
>> I think Adam is talking about this bp:
>> https://blueprints.launchpad.net/nova/+spec/evacuate-instance-automatically
>>
>> For now, we're using Nagios probe/event to trigger the Nova evacuate
>> command, but I think it's possible to do that in Nova if we can find a
>> good way to define the trigger policy.
>
> I actually think that's the right way to do it.

+1. Not everything needs to be built-in to Nova. This very much sounds 
like something that should be handled by PaaS-layer things that can 
react to a Nagios notification (or any other event) and take some sort 
of action, possibly using "administrative" commands like nova evacuate.

 > There are a couple of
> other things to consider:
>
> 1) An ideal solution also includes fencing.  When you evacuate, you want
> to make sure you've fenced the original compute node.  You need to make
> absolutely sure that the same VM can't be running more than once,
> especially when the disks are backed by shared storage.
>
> Because of the fencing requirement, another option would be to use
> Pacemaker to orchestrate this whole thing.  Historically Pacemaker
> hasn't been suitable to scale to the number of compute nodes an
> OpenStack deployment might have, but Pacemaker has a new feature called
> pacemaker_remote [1] that may be suitable.
>
> 2) Looking forward, there is a lot of demand for doing this on a per
> instance basis.  We should decide on a best practice for allowing end
> users to indicate whether they would like their VMs automatically
> rescued by the infrastructure, or just left down in the case of a
> failure.  It could be as simple as a special tag set on an instance [2].

Please note that server instance tagging (thanks for the shout-out, BTW) 
is intended for only user-defined tags, not system-defined metadata 
which is what this sounds like...

Of course, one might implement some external polling/monitoring system 
using server instance tags, which might do a nova list --tag $TAG --host 
$FAILING_HOST, and initiate a migrate for each returned server instance...

Best,
-jay

> [1]
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/
> [2] https://review.openstack.org/#/c/127281/
>



More information about the OpenStack-dev mailing list