[openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

Steven Dake sdake at redhat.com
Wed Feb 5 15:35:37 UTC 2014


On 02/04/2014 06:34 PM, Robert Collins wrote:
> On 5 February 2014 13:14, Zane Bitter <zbitter at redhat.com> wrote:
>
>
>> That's not a great example, because one DB server depends on the other,
>> forcing them into updating serially anyway.
>>
>> I have to say that even in general, this whole idea about applying update
>> policies to non-grouped resources doesn't make a whole lot of sense to me.
>> For non-grouped resources you control the resource definitions individually
>> - if you don't want them to update at a particular time, you have the option
>> of just not updating them.
> Well, I don't particularly like the idea of doing thousands of
> discrete heat stack-update calls, which would seem to be what you're
> proposing.
>
> On groups: autoscale groups are a problem for secure minded
> deployments because every server has identical resources (today) and
> we very much want discrete credentials per server - at least this is
> my understanding of the reason we're not using scaling groups in
> TripleO.
>
>> Where you _do_ need it is for scaling groups where every server is based on
>> the same launch config, so you need a way to control the members
>> individually - by batching up operations (done), adding delays (done) or,
>> even better, notifications and callbacks.
>>
>> So it seems like doing 'rolling' updates for any random subset of resources
>> is effectively turning Heat into something of a poor-man's workflow service,
>> and IMHO that is probably a mistake.
> I mean to reply to the other thread, but here is just as good :) -
> heat as a way to describe the intended state, and heat takes care of
> transitions, is a brilliant model. It absolutely implies a bunch of
> workflows - the AWS update policy is probably the key example.
>
> Being able to gracefully, *automatically* work through a transition
> between two defined states, allowing the nodes in question to take
> care of their own needs along the way seems like a pretty core
> function to fit inside Heat itself. Its not at all the same as 'allow
> users to define abitrary workflows'.
>
> -Rob
Rob,

I'm not precisely certain what your proposing, but I think we need to 
take care not to turn the Heat DSL into a full-fledged programming 
language.  IMO thousands of updates done through heat is a perfect way 
for a third party service to do such things - eg control workflow.  
Clearly there is a workflow gap in OpenStack, and possibly that thing 
doing the thousands of updates should be a workflow service, rather then 
TripleO, but workflow is out of scope for Heat proper.  Such a workflow 
service could potentially fit in the Orchestration program alongside 
Heat and Autoscaling.  It is too bad there isn't a workflow service 
already because we are getting alot of pressure to make Heat fill this 
gap.  I personally believe filling this gap with heat would be a mistake 
and the correct course of action would be for a workflow service to 
emerge to fill this need (and depend on Heat for orchestration).

I believe this may be what Zane is reacting to; I believe the Heat 
community would like to avoid making the DSL more programmable because 
then it is harder to use and support.  The parameters,resources,outputs 
DSL objects are difficult enough for new folks to pick up and its only 3 
things to understand...

Regards
-steve

>
>> What we do need for all resources (not just scaling groups) is a way for the
>> user to say "for this particular resource, notify me when it has updated
>> (but, if possible, before we have taken any destructive actions on it), give
>> me a chance to test it and accept or reject the update". For example, when
>> you resize a server, give the user a chance to confirm or reject the change
>> at the VERIFY_RESIZE step (Trove requires this). Or when you replace a
>> server during an update, give the user a chance to test the new server and
>> either keep it (continue on and delete the old one) or not (roll back). Or
>> when you replace a server in a scaling group, notify the load balancer _or
>> some other thing_ (e.g. OpenShift broker node) that a replacement has been
>> created and wait for it to switch over to the new one before deleting the
>> old one. Or, of course, when you update a server to some new config, give
>> the user a chance to test it out and make sure it works before continuing
>> with the stack update. All of these use cases can, I think, be solved with a
>> single feature.
>>
>> The open questions for me are:
>> 1) How do we notify the user that it's time to check on a resource?
>> (Marconi?)
> This is the graceful update stuff I referred to in my mail to Clint -
> the proposal from hallway discussions in HK was to do this by
> notifying the server itself (that way we don't create a centralised
> point of fail). I can see though that in a general sense not all
> resources are servers. But - how about allowing to specify where to
> notify (and notifing is always by setting a value in metadata
> somewhere) - users can then pull that out themselves however they want
> to. Adding push notifications is orthogonal IMO - we'd like that for
> all metadata changes, for instance.
>
>> 2) How does the user ack/nack? (You're suggesting reusing WaitCondition, and
>> that makes sense to me.)
> The server would use a WaitCondition yes.
>
>> 3) How do we break up the operations so the notification occurs at the right
>> time? (With difficulty, but it should be do-able.)
> Just wrap the existing operations - if <should notify> then:
> notify-wait-do, otherwise just do.
>
>> 4) How does the user indicate for which resources they want to be notified?
>> (Inside an update_policy? Another new directive at the
>> type/properties/depends_on/update_policy level?)
> I would say per resource.
>
> -Rob
>
>




More information about the OpenStack-dev mailing list