[openstack-dev] [nova][mistral] Automatic evacuation as a long running task
roman.dobosz at intel.com
Fri Oct 2 13:05:19 UTC 2015
The case of automatic evacuation (or resurrection currently), is a topic
which surfaces once in a while, but it isn't yet fully supported by
OpenStack and/or by the cluster services. There was some attempts to
bring the feature into OpenStack, however it turns out it cannot be
easily integrated with. On the other hand evacuation may be executed
from the outside using Nova client or Nova API calls for evacuation
I did some research regarding the ways how it could be designed, based
on Russel Bryant blog post as a starting point. Apart from it, I've
also taken high availability and reliability into consideration when
designing the solution.
Together with coworker, we did first PoC to enable cluster to be able
to perform evacuation. The idea behind that PoC was simple - providing
additional, small service which would trigger and supervise the
evacuation process, which would be triggered from the outside (in this
example we were using Pacemaker fencing facility, but it might be
anything) using RabbitMQ directly. Those services are running on the
control plane in AA fashion.
That work well for us. So we started exploring other possibilities like
oslo.messaging just to use it in the same manner as we did in the poc.
It turns out that the implementation will not be as easy, because there
is no facility in the oslo.messaging for letting sending an ACK from the
client after the job is done (not as soon as it gets the message). We
also looked at the existing OpenStack projects for a candidate which
provide service for managing long running tasks.
There is the Mistral project, which gives us almost all the features we
need. The one missing feature is the HA of the Mistral tasks execution.
The question is, how such problem (long running tasks) could be resolved
More information about the OpenStack-dev