[openstack-dev] [Heat]Blueprint for retry function with idenpotency in Heat

Mitsuru Kanabuchi kanabuchi.mitsuru at po.ntts.co.jp
Fri Oct 18 10:38:47 UTC 2013


On Fri, 18 Oct 2013 10:34:11 +0100
Steven Hardy <shardy at redhat.com> wrote:
> IMO we don't want to go down the path of retry-loops in Heat, or scheduled
> self-healing. We should just allow the user to trigger an stack update from
> a failed state (CREATE_FAILED, or UPDATE_FAILED), and then they can define
> their own policy on when recovery happens by triggering a stack update.

I think "retry" has two different implications in this topic.
I'd like to organize "retry" means.

=============================
1) Stack Creation retry

  proposed here:
    https://blueprints.launchpad.net/heat/+spec/retry-failed-update

  - trigger: stack update to failed stack
  - function: replace failed resource and go ahead

2) API retry

  proposed here(Our blueprint):
    https://blueprints.launchpad.net/heat/+spec/support-retry-with-idempotency

  - trigger: can't get API response or get unexpected response code
  - function: retry API requests until it gets expected response code or it reaches a retry limit
=============================

Our proposal is 2)
After over the retry limit, Stack would change to XXX_FAILED status.
I think it is same of currently heat behavior.
We won't change mechanism of stack state transition.

I understand proposal 1) aims to restart stack-processing of failed stack.
These are different layer's subject, and both functionality will able to exist together.


On Fri, 18 Oct 2013 10:34:11 +0100
Steven Hardy <shardy at redhat.com> wrote:

> On Fri, Oct 18, 2013 at 12:13:45PM +1300, Steve Baker wrote:
> > On 10/18/2013 01:54 AM, Mitsuru Kanabuchi wrote:
> > > Hello Mr. Clint,
> > >
> > > Thank you for your comment and prioritization.
> > > I'm glad to discuss you who feel same issue.
> > >
> > >> I took the liberty of targeting your blueprint at icehouse. If you don't
> > >> think it is likely to get done in icehouse, please raise that with us at
> > >> the weekly meeting if you can and we can remove it from the list.
> > > Basically, this blueprint is targeted IceHouse release.
> > >
> > > However, the schedule is depend on follows blueprint:
> > >   https://blueprints.launchpad.net/nova/+spec/idempotentcy-client-token
> > >
> > > We're going to start implementation to Heat after ClientToken implemented.
> > > I think ClientToken is necessary function for this blueprint, and important function for other callers!
> > Can there not be a default retry implementation which deletes any
> > ERRORed resource and attempts the operation again? Then specific
> > resources can switch to ClientToken as they become available.
> 
> Yes, I think this is the way to go - have logic in every resources
> handle_update (which would probably be common with check_create_complete),
> which checks the status of the underlying physical resource, and if it's
> not in the expected status, we replace it.
> 
> This probably needs to be a new flag or API operation, as it clearly has
> the possibility to be more destructive than a normal update (may delete
> resources which have not changed in the template, but are in a bad state)
> 
> > > On Wed, 16 Oct 2013 23:32:22 -0700
> > > Clint Byrum <clint at fewbar.com> wrote:
> > >
> > >> Excerpts from Mitsuru Kanabuchi's message of 2013-10-16 04:47:08 -0700:
> > >>> Hi all,
> > >>>
> > >>> We proposed a blueprint that supports API retry function with idenpotency for Heat.
> > >>> Prease review the blueprint.
> > >>>
> > >>>   https://blueprints.launchpad.net/heat/+spec/support-retry-with-idempotency
> > >>>
> > >> This looks great. It addresses some of what I've struggled with while
> > >> thinking of how to handle the retry problem.
> > >>
> > >> I went ahead and linked bug #1160052 to the blueprint, as it is one that
> > >> I've been trying to get a solution for.
> > >>
> > >> I took the liberty of targeting your blueprint at icehouse. If you don't
> > >> think it is likely to get done in icehouse, please raise that with us at
> > >> the weekly meeting if you can and we can remove it from the list.
> > >>
> > >> Note that there is another related blueprint here:
> > >>
> > >> https://blueprints.launchpad.net/heat/+spec/retry-failed-update
> > >>
> > >>
> > 
> > Has any thought been given to where the policy should be specified for
> > how many retries to attempt?
> > 
> > Maybe sensible defaults should be defined in the python resources, and a
> > new resource attribute can allow an override in the template on a
> > per-resource basis (I'm referring to an attribute at the same level as
> > Type, Properties, Metadata)
> 
> IMO we don't want to go down the path of retry-loops in Heat, or scheduled
> self-healing. We should just allow the user to trigger an stack update from
> a failed state (CREATE_FAILED, or UPDATE_FAILED), and then they can define
> their own policy on when recovery happens by triggering a stack update.
> 
> This is basically what's described for discussion here:
> http://summit.openstack.org/cfp/details/95
> 
> I personally think the scheduled self-healing is a bad idea, but the
> convergence (as a special type of stack update) is a good one.
> 
> For automatic recovery, we should instead be looking at triggering things
> via Ceilometer alarms, so we can move towards removing all periodic task
> stuff from Heat (because it doesn't scale, and it presents major issues
> when scaling out)
> 
> Steve
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--------------------
  Mitsuru Kanabuchi
    NTT Software Corporation
    E-Mail : kanabuchi.mitsuru at po.ntts.co.jp




More information about the OpenStack-dev mailing list