[openstack-dev] [heat] convergence cancel messages

Anant Patil anant.patil at hpe.com
Wed Feb 24 07:08:31 UTC 2016


Hi,

I would like the discuss various approaches towards fixing bug
https://launchpad.net/bugs/1533176

When convergence is on, and if the stack is stuck, there is no way to
cancel the existing request. This feature was not implemented in
convergence, as the user can again issue an update on an in-progress
stack. But if a resource worker is stuck, the new update will wait
for-ever on it and the update will not be effective.

The solution is to implement cancel request. Since the work for a stack
is distributed among heat engines, the cancel request will not work as
it does in legacy way. Many or all of the heat engines might be running
worker threads to provision a stack.

I could think of two options which I would like to discuss:

(a) When a user triggered cancel request is received, set the stack
current traversal to None or something else other than current
traversal. With this the new check-resources/workers will never be
triggered. This is okay as long as the worker(s) is not stuck. The
existing workers will finish running, and no new check-resource
(workers) will be triggered, and it will be a graceful cancel.  But the
workers that are stuck will be stuck for-ever till stack times-out.  To
take care of such cases, we will have to implement logic of "polling"
the DB at regular intervals (may be at each step() of scheduler task)
and bail out if the current traversal is updated. Basically, each worker
will "poll" the DB to see if the current traversal is still valid and if
not, stop itself. The drawback of this approach is that all the workers
will be hitting the DB and incur a significant overhead.  Besides, all
the stack workers irrespective of whether they will be cancelled or not,
will keep on hitting DB. The advantage is that it probably is easier to
implement. Also, if the worker is stuck in particular "step", then this
approach will not work.

(b) Another approach is to send cancel message to all the heat engines
when one receives a stack cancel request. The idea is to use the thread
group manager in each engine to keep track of threads running for a
stack, and stop the thread group when a cancel message is received. The
advantage is that the messages to cancel stack workers is sent only when
required and there is no other over-head. The draw-back is that the
cancel message is 'broadcasted' to all heat engines, even if they are
not running any workers for the given stack, though, in such cases, it
will be a just no-op for the heat-engine (the message will be gracefully
discarded).

Implementation for option (b) is for review:
https://review.openstack.org/#/c/279406/

I am seeking your input on these approaches. Please share any other
ideas if you have.

-- Anant



More information about the OpenStack-dev mailing list