Open Stack

Wed Jan 14 12:24:19 UTC 2015

On 09-Jan-15 19:19, Zane Bitter wrote:
> On 09/01/15 01:07, Angus Salkeld wrote:
>> I am not in favor of the --continue as an API. I'd suggest responding to
>> resource timeouts and if there is no response from the task, then
>> re-start (continue)
>> the task.
> 
> Yeah, I am not in favour of a new API either. In fact, I believe we 
> already have this functionality: if you do another update with the same 
> template and parameters then it will break the lock and continue the 
> update if the engine running the previous update has failed. And when we 
> switch over to convergence it will still do the Right Thing without any 
> extra implementation effort.
> 
> There is one improvement we can make to the API though: in Juno, Ton 
> added a PATCH method to stack update such that you can reuse the 
> existing parameters without specifying them again. We should extend this 
> to the template also, so you wouldn't have to supply any data to get 
> Heat to start another update with the same template and parameters.
> 
> I'm not sure if there is a blueprint for this already; co-ordinate with 
> Ton if you are planning to work on it.
> 
> cheers,
> Zane.
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
IMHO, there are two different things here:

1. Failures external to Heat engine (or out-of-band failures). A
convenient way to issue a stack-update on a stack that fails due to
out-of-band failures is needed. When a stack fails due to service
unavailability or infrastructure issues, operators/admins can fix those
issues and then re-start the provisioning or tell users to restart.
Currently, it is done by issuing a stack-update on the failed stack.  It
will be convenient to have an option to the stack-update command to
retry the stack operation without having to specify the templates and
parameters + environment again. User shouldn't need to supply any data
again to start the update of failed stack.

Folks working in Horizon would definitely need something like this.
Horizon UI need not save a local copy of template and parameter +
environment supplied by user, but rely on Heat because Heat already has
the data. It would be convenient for Horizon to issue a --retry for
stack-create or stack-update when the stack fails due to external
problems that the operators/users fix.

2. Internal failures (or Heat engine failing). Continuing a stack
operation even after a Heat engine fails due to internal error. I think
Vishnu is talking about this part. When an engine fails, other engines
should be able to take up the task of provisioning the stack without any
user intervention. No new API or any option to stack-create or update is
needed. Something like a periodic timer is needed to check if the engine
provisioning a stack is up. If not, the lock is stolen and stack is
restarted... may be by again issuing stack-update with same template and
parameters. This is like an interim solution to continuous
observer...the stack timer would periodically check for stacks that are
"stuck" because the engine failed and issue another update or something
equivalent to proceed with other Heat engines. Or as a first step, like
Steve said, put the stack to FAILED state and let user initiate a
stack-update (probably with the option specified 1).

Please share your thoughts.

- Anant

Open Stack

[openstack-dev] [Heat] Precursor to Phase 1 Convergence

OpenStack

Community

Documentation

Branding & Legal