[openstack-dev] [Nova] Automatic evacuate
Tim Bell
Tim.Bell at cern.ch
Tue Oct 14 18:49:41 UTC 2014
> -----Original Message-----
> From: Jay Pipes [mailto:jaypipes at gmail.com]
> Sent: 14 October 2014 19:01
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [Nova] Automatic evacuate
>
> On 10/13/2014 05:59 PM, Russell Bryant wrote:
> > Nice timing. I was working on a blog post on this topic.
> >
> > On 10/13/2014 05:40 PM, Fei Long Wang wrote:
> >> I think Adam is talking about this bp:
> >> https://blueprints.launchpad.net/nova/+spec/evacuate-instance-automat
> >> ically
> >>
> >> For now, we're using Nagios probe/event to trigger the Nova evacuate
> >> command, but I think it's possible to do that in Nova if we can find
> >> a good way to define the trigger policy.
> >
> > I actually think that's the right way to do it.
>
> +1. Not everything needs to be built-in to Nova. This very much sounds
> like something that should be handled by PaaS-layer things that can react to a
> Nagios notification (or any other event) and take some sort of action, possibly
> using "administrative" commands like nova evacuate.
>
Nova is also not the right place to do the generic solution as many other parts could be involved... neutron and cinder come to mind. Nova needs to provide the basic functions but it needs something outside to make it all happen transparently.
I would really like a shared solution rather than each deployment doing their own and facing identical problems. A best of breed solution which can be incrementally improved as we find problems to dget the hypervisor down event, to force detach of boot volumes, restart elsewhere and reconfigure floating ips with race conditions is needed.
Some standards for tagging is good but we also need some code :-)
Tim
> > There are a couple of
> > other things to consider:
> >
> > 1) An ideal solution also includes fencing. When you evacuate, you
> > want to make sure you've fenced the original compute node. You need
> > to make absolutely sure that the same VM can't be running more than
> > once, especially when the disks are backed by shared storage.
> >
> > Because of the fencing requirement, another option would be to use
> > Pacemaker to orchestrate this whole thing. Historically Pacemaker
> > hasn't been suitable to scale to the number of compute nodes an
> > OpenStack deployment might have, but Pacemaker has a new feature
> > called pacemaker_remote [1] that may be suitable.
> >
> > 2) Looking forward, there is a lot of demand for doing this on a per
> > instance basis. We should decide on a best practice for allowing end
> > users to indicate whether they would like their VMs automatically
> > rescued by the infrastructure, or just left down in the case of a
> > failure. It could be as simple as a special tag set on an instance [2].
>
> Please note that server instance tagging (thanks for the shout-out, BTW) is
> intended for only user-defined tags, not system-defined metadata which is what
> this sounds like...
>
> Of course, one might implement some external polling/monitoring system using
> server instance tags, which might do a nova list --tag $TAG --host
> $FAILING_HOST, and initiate a migrate for each returned server instance...
>
> Best,
> -jay
>
> > [1]
> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_R
> > emote/ [2] https://review.openstack.org/#/c/127281/
> >
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list