Open Stack

Fri Aug 19 22:35:00 UTC 2016

On 19/08/16 09:55, Anant Patil wrote:
>
>     What I'm suggesting is very close to that:
>
>     (1) stack-cancel-update <stack_id> will start another update using the
>     previous template/environment. We'll start rolling back; in-progress
>     resources will be allowed to complete normally.
>     (2) stack-cancel-update <stack_id> --no-rollback will set the
>     traversal_id to None so no further resources will be updated;
>     in-progress resources will be allowed to complete normally.
>     (3) resource-mark-unhealthy <stack_id> <resource_id> ... <resource_id>
>     Kill any threads running a CREATE or UPDATE on the given resources, mark
>     as CHECK_FAILED if they are not already in UPDATE_FAILED, don't do
>     anything else. If the resource was in progress, the stack won't progress
>     further, other resources currently in-progress will complete, and if
>     rollback is enabled and no other traversal has started then it will roll
>     back to the previous template/environment.
>
> I have started implementation of the above three mechanisms. The first
> two are implemented in https://review.openstack.org/#/c/357618

This looks great, thanks! That covers both our internal use of 
update-cancel and the current user API update-cancel nicely.

> Note that the (2) needs a change in heat client (openstack client?) to
> have a --no-rollback option.

Yeah, and also a (very minor) REST API change. I'd be in favour of 
trying to get this in before Newton FF, it'd be really useful to have.

> (3) is a bit of long haul, and needs:
> https://review.openstack.org/343076 : Adds mechanism to interrupt
> convergence worker threads
> https://review.openstack.org/301483 : Mechanism to send cancel message
> and cancel worker upon receiving messages

Another thing I forgot is that when we delete a stack, we cancel all the 
threads working on it, so that any in-progress update/create used to be 
stopped (you're about to delete that stuff anyway, so you might as well 
not bother with anything else), and the lack of this functionality in 
convergence is causing problems for some users. It looks like this patch 
is intended to build on the previous two to resolve that:

https://review.openstack.org/#/c/354000/

(This is actually going to be much better than the old behaviour, 
because it turned out that cancelling threads was very much not the 
right thing to do, and it's much better to stop them at a yield point.)

So I think all of the above apart from the API/client change for (2) are 
going to be critical to land for Newton. (They're all in a sense bugs at 
the moment.)

> Apart from the above two, I am implementing the actual patch which will
> leverage the above two to complete resource-mark-unhealthy feature in
> convergence.

Great! Hopefully people will rarely need this, but it'll be much more 
comfortable unleashing convergence on the world if we know that this 
exists as a circuit-breaker in case something does get stuck.

Let me know if I can help with any of this stuff without stepping on any 
toes (time zones unfortunately make it hard for you and I to 
co-ordinate). I'll at least try to circle back regularly to the reviews.

cheers,
Zane.

Open Stack

[openstack-dev] [heat] convergence cancel messages

OpenStack

Community

Documentation

Branding & Legal