[openstack-dev] [heat] convergence cancel messages
anant.patil at hpe.com
Wed Feb 24 09:33:55 UTC 2016
On 24-Feb-16 14:26, Anant Patil wrote:
> On 24-Feb-16 13:12, Clint Byrum wrote:
>> Excerpts from Anant Patil's message of 2016-02-23 23:08:31 -0800:
>>> I would like the discuss various approaches towards fixing bug
>>> When convergence is on, and if the stack is stuck, there is no way to
>>> cancel the existing request. This feature was not implemented in
>>> convergence, as the user can again issue an update on an in-progress
>>> stack. But if a resource worker is stuck, the new update will wait
>>> for-ever on it and the update will not be effective.
>>> The solution is to implement cancel request. Since the work for a stack
>>> is distributed among heat engines, the cancel request will not work as
>>> it does in legacy way. Many or all of the heat engines might be running
>>> worker threads to provision a stack.
>>> I could think of two options which I would like to discuss:
>>> (a) When a user triggered cancel request is received, set the stack
>>> current traversal to None or something else other than current
>>> traversal. With this the new check-resources/workers will never be
>>> triggered. This is okay as long as the worker(s) is not stuck. The
>>> existing workers will finish running, and no new check-resource
>>> (workers) will be triggered, and it will be a graceful cancel. But the
>>> workers that are stuck will be stuck for-ever till stack times-out. To
>>> take care of such cases, we will have to implement logic of "polling"
>>> the DB at regular intervals (may be at each step() of scheduler task)
>>> and bail out if the current traversal is updated. Basically, each worker
>>> will "poll" the DB to see if the current traversal is still valid and if
>>> not, stop itself. The drawback of this approach is that all the workers
>>> will be hitting the DB and incur a significant overhead. Besides, all
>>> the stack workers irrespective of whether they will be cancelled or not,
>>> will keep on hitting DB. The advantage is that it probably is easier to
>>> implement. Also, if the worker is stuck in particular "step", then this
>>> approach will not work.
>> I think this is the simplest option. And if the polling gets to be too
>> much, you can implement an observer pattern where one worker is just
>> assigned to poll the traversal and if it changes, RPC to the known
>> active workers that they should cancel any jobs using a now-cancelled
>> stack version.
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> Hi Clint,
> I see that observer pattern is simple, but IMO it too is not efficient.
> To implement it, we will have to note down in DB the worker to engine-id
> relationship for all the workers, and then go through all of them and
> send targeted cancel messages. This will also need us to have thread
> group manager in each engine so that it can stop the thread group
> running workers for the stack.
> Please help me understand if there is any particular disadvantage in
> option (b) that I am not missing.
Sorry, I meant I am missing :)
> -- Anant
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev