[openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

Robert Collins robertc at robertcollins.net
Wed Feb 5 01:34:50 UTC 2014


On 5 February 2014 13:14, Zane Bitter <zbitter at redhat.com> wrote:


> That's not a great example, because one DB server depends on the other,
> forcing them into updating serially anyway.
>
> I have to say that even in general, this whole idea about applying update
> policies to non-grouped resources doesn't make a whole lot of sense to me.
> For non-grouped resources you control the resource definitions individually
> - if you don't want them to update at a particular time, you have the option
> of just not updating them.

Well, I don't particularly like the idea of doing thousands of
discrete heat stack-update calls, which would seem to be what you're
proposing.

On groups: autoscale groups are a problem for secure minded
deployments because every server has identical resources (today) and
we very much want discrete credentials per server - at least this is
my understanding of the reason we're not using scaling groups in
TripleO.

> Where you _do_ need it is for scaling groups where every server is based on
> the same launch config, so you need a way to control the members
> individually - by batching up operations (done), adding delays (done) or,
> even better, notifications and callbacks.
>
> So it seems like doing 'rolling' updates for any random subset of resources
> is effectively turning Heat into something of a poor-man's workflow service,
> and IMHO that is probably a mistake.

I mean to reply to the other thread, but here is just as good :) -
heat as a way to describe the intended state, and heat takes care of
transitions, is a brilliant model. It absolutely implies a bunch of
workflows - the AWS update policy is probably the key example.

Being able to gracefully, *automatically* work through a transition
between two defined states, allowing the nodes in question to take
care of their own needs along the way seems like a pretty core
function to fit inside Heat itself. Its not at all the same as 'allow
users to define abitrary workflows'.

-Rob

> What we do need for all resources (not just scaling groups) is a way for the
> user to say "for this particular resource, notify me when it has updated
> (but, if possible, before we have taken any destructive actions on it), give
> me a chance to test it and accept or reject the update". For example, when
> you resize a server, give the user a chance to confirm or reject the change
> at the VERIFY_RESIZE step (Trove requires this). Or when you replace a
> server during an update, give the user a chance to test the new server and
> either keep it (continue on and delete the old one) or not (roll back). Or
> when you replace a server in a scaling group, notify the load balancer _or
> some other thing_ (e.g. OpenShift broker node) that a replacement has been
> created and wait for it to switch over to the new one before deleting the
> old one. Or, of course, when you update a server to some new config, give
> the user a chance to test it out and make sure it works before continuing
> with the stack update. All of these use cases can, I think, be solved with a
> single feature.
>
> The open questions for me are:
> 1) How do we notify the user that it's time to check on a resource?
> (Marconi?)

This is the graceful update stuff I referred to in my mail to Clint -
the proposal from hallway discussions in HK was to do this by
notifying the server itself (that way we don't create a centralised
point of fail). I can see though that in a general sense not all
resources are servers. But - how about allowing to specify where to
notify (and notifing is always by setting a value in metadata
somewhere) - users can then pull that out themselves however they want
to. Adding push notifications is orthogonal IMO - we'd like that for
all metadata changes, for instance.

> 2) How does the user ack/nack? (You're suggesting reusing WaitCondition, and
> that makes sense to me.)

The server would use a WaitCondition yes.

> 3) How do we break up the operations so the notification occurs at the right
> time? (With difficulty, but it should be do-able.)

Just wrap the existing operations - if <should notify> then:
notify-wait-do, otherwise just do.

> 4) How does the user indicate for which resources they want to be notified?
> (Inside an update_policy? Another new directive at the
> type/properties/depends_on/update_policy level?)

I would say per resource.

-Rob


-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list