[openstack-dev] [nova][mistral] Automatic evacuation as a long running task
Steve Gordon
sgordon at redhat.com
Tue Oct 6 14:46:38 UTC 2015
----- Original Message -----
> From: "Roman Dobosz" <roman.dobosz at intel.com>
> To: "OpenStack Development Mailing List" <openstack-dev at lists.openstack.org>
>
> Hi all,
>
> The case of automatic evacuation (or resurrection currently), is a topic
> which surfaces once in a while, but it isn't yet fully supported by
> OpenStack and/or by the cluster services. There was some attempts to
> bring the feature into OpenStack, however it turns out it cannot be
> easily integrated with. On the other hand evacuation may be executed
> from the outside using Nova client or Nova API calls for evacuation
> initiation.
>
> I did some research regarding the ways how it could be designed, based
> on Russel Bryant blog post[1] as a starting point. Apart from it, I've
> also taken high availability and reliability into consideration when
> designing the solution.
>
> Together with coworker, we did first PoC[2] to enable cluster to be able
> to perform evacuation. The idea behind that PoC was simple - providing
> additional, small service which would trigger and supervise the
> evacuation process, which would be triggered from the outside (in this
> example we were using Pacemaker fencing facility, but it might be
> anything) using RabbitMQ directly. Those services are running on the
> control plane in AA fashion.
Hi Roman,
Another aspect of this which we discussed briefly a few weeks back was whether external HA solutions like that proposed by Russell should be "opt-in" on a per-instance basis via an image property or flavor extra specification. That is that the external instance high-availability solution would only automatically move virtual machines that had this attribute associated with them, whatever it ends up being.
I'm wondering if there is any appetite in the community for standardizing on what this literal property or extra specification would be even though the delivery of the HA solutions themselves is not part of Nova itself but rather handled by the deployers/distributors using external tools like Pacemaker?
Thanks,
Steve
> That work well for us. So we started exploring other possibilities like
> oslo.messaging just to use it in the same manner as we did in the poc.
> It turns out that the implementation will not be as easy, because there
> is no facility in the oslo.messaging for letting sending an ACK from the
> client after the job is done (not as soon as it gets the message). We
> also looked at the existing OpenStack projects for a candidate which
> provide service for managing long running tasks.
>
> There is the Mistral project, which gives us almost all the features we
> need. The one missing feature is the HA of the Mistral tasks execution.
>
> The question is, how such problem (long running tasks) could be resolved
> in OpenStack?
>
> [1] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
> [2] https://github.com/dawiddeja/evacuationd
More information about the OpenStack-dev
mailing list