[openstack-dev] [nova][mistral] Automatic evacuation as a long running task

Renat Akhmerov rakhmerov at mirantis.com
Tue Oct 6 09:51:45 UTC 2015


Here are some things that may help you:
In Mistral we’ve been aware of this post-message-processing ACK problem since we began to use oslo.messaging and we’ve been communicating with oslo team in order to fix that. Patch [1] is supposed to help us finally solve it. I would encourage you to participate in that effort too to make sure this matches your understanding of the problem. We’ve also seen a bug [2] that you filed at Launchpad so we’ll be updating its status.
As far as Mistral HA, I would say the following: it is actually supported by design but there’s a number of issues with its implementation. Not that it’s an HA info but, FYI, there are existing Mistral installations working in production with multiple Mistral engines, executors and api servers. Although I have to admit that it’s not so easy yet to make such installations work reliably. Generally, we keep working on it and we have huge plans for making Mistral HA in Mitaka cycle. Significant part of design sessions in Tokyo will be exactly about HA which includes a lot of things: proper testing, profiling, identifying points of failure and overall performance improvement (which is also one of the things influencing overall robustness).
As far as the task you’re trying to solve, I can say that, IMO, Mistral is a good candidate for this just because it’s really a standalone reliable service that can take execution of a long process under its control. This is one of the main ideas behind it. Currently we are planning to address similar cases with Mistral within our company. I think we’ll share the results when once we get something done and described.

Thanks for bringing this up. And I'll say what I usually do: you’re very welcome to contribute into Mistral, it should be fun to do.

Looking forward to hear more from you about your discoveries.

[1] https://review.openstack.org/#/c/229186/ <https://review.openstack.org/#/c/229186/>
[2] https://bugs.launchpad.net/mistral/+bug/1502120 <https://bugs.launchpad.net/mistral/+bug/1502120>

Renat Akhmerov
@ Mirantis Inc.

> On 02 Oct 2015, at 19:05, Roman Dobosz <roman.dobosz at intel.com> wrote:
> Hi all,
> The case of automatic evacuation (or resurrection currently), is a topic 
> which surfaces once in a while, but it isn't yet fully supported by 
> OpenStack and/or by the cluster services. There was some attempts to 
> bring the feature into OpenStack, however it turns out it cannot be 
> easily integrated with. On the other hand evacuation may be executed 
> from the outside using Nova client or Nova API calls for evacuation 
> initiation.
> I did some research regarding the ways how it could be designed, based 
> on Russel Bryant blog post[1] as a starting point. Apart from it, I've 
> also taken high availability and reliability into consideration when 
> designing the solution.
> Together with coworker, we did first PoC[2] to enable cluster to be able 
> to perform evacuation. The idea behind that PoC was simple - providing 
> additional, small service which would trigger and supervise the 
> evacuation process, which would be triggered from the outside (in this 
> example we were using Pacemaker fencing facility, but it might be 
> anything) using RabbitMQ directly. Those services are running on the 
> control plane in AA fashion.
> That work well for us. So we started exploring other possibilities like 
> oslo.messaging just to use it in the same manner as we did in the poc.  
> It turns out that the implementation will not be as easy, because there 
> is no facility in the oslo.messaging for letting sending an ACK from the 
> client after the job is done (not as soon as it gets the message). We 
> also looked at the existing OpenStack projects for a candidate which 
> provide service for managing long running tasks.
> There is the Mistral project, which gives us almost all the features we 
> need. The one missing feature is the HA of the Mistral tasks execution.
> The question is, how such problem (long running tasks) could be resolved 
> in OpenStack?
> [1] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
> [2] https://github.com/dawiddeja/evacuationd
> -- 
> Cheers,
> Roman Dobosz
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151006/37f30e9c/attachment.html>

More information about the OpenStack-dev mailing list