[openstack-dev] [heat] convergence cancel messages
zbitter at redhat.com
Fri Aug 19 22:35:00 UTC 2016
On 19/08/16 09:55, Anant Patil wrote:
> What I'm suggesting is very close to that:
> (1) stack-cancel-update <stack_id> will start another update using the
> previous template/environment. We'll start rolling back; in-progress
> resources will be allowed to complete normally.
> (2) stack-cancel-update <stack_id> --no-rollback will set the
> traversal_id to None so no further resources will be updated;
> in-progress resources will be allowed to complete normally.
> (3) resource-mark-unhealthy <stack_id> <resource_id> ... <resource_id>
> Kill any threads running a CREATE or UPDATE on the given resources, mark
> as CHECK_FAILED if they are not already in UPDATE_FAILED, don't do
> anything else. If the resource was in progress, the stack won't progress
> further, other resources currently in-progress will complete, and if
> rollback is enabled and no other traversal has started then it will roll
> back to the previous template/environment.
> I have started implementation of the above three mechanisms. The first
> two are implemented in https://review.openstack.org/#/c/357618
This looks great, thanks! That covers both our internal use of
update-cancel and the current user API update-cancel nicely.
> Note that the (2) needs a change in heat client (openstack client?) to
> have a --no-rollback option.
Yeah, and also a (very minor) REST API change. I'd be in favour of
trying to get this in before Newton FF, it'd be really useful to have.
> (3) is a bit of long haul, and needs:
> https://review.openstack.org/343076 : Adds mechanism to interrupt
> convergence worker threads
> https://review.openstack.org/301483 : Mechanism to send cancel message
> and cancel worker upon receiving messages
Another thing I forgot is that when we delete a stack, we cancel all the
threads working on it, so that any in-progress update/create used to be
stopped (you're about to delete that stuff anyway, so you might as well
not bother with anything else), and the lack of this functionality in
convergence is causing problems for some users. It looks like this patch
is intended to build on the previous two to resolve that:
(This is actually going to be much better than the old behaviour,
because it turned out that cancelling threads was very much not the
right thing to do, and it's much better to stop them at a yield point.)
So I think all of the above apart from the API/client change for (2) are
going to be critical to land for Newton. (They're all in a sense bugs at
> Apart from the above two, I am implementing the actual patch which will
> leverage the above two to complete resource-mark-unhealthy feature in
Great! Hopefully people will rarely need this, but it'll be much more
comfortable unleashing convergence on the world if we know that this
exists as a circuit-breaker in case something does get stuck.
Let me know if I can help with any of this stuff without stepping on any
toes (time zones unfortunately make it hard for you and I to
co-ordinate). I'll at least try to circle back regularly to the reviews.
More information about the OpenStack-dev