[openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

Zane Bitter zbitter at redhat.com
Wed Feb 5 00:14:09 UTC 2014


On 03/02/14 17:09, Clint Byrum wrote:
> Excerpts from Thomas Herve's message of 2014-02-03 12:46:05 -0800:
>>> So, I wrote the original rolling updates spec about a year ago, and the
>>> time has come to get serious about implementation. I went through it and
>>> basically rewrote the entire thing to reflect the knowledge I have
>>> gained from a year of working with Heat.
>>>
>>> Any and all comments are welcome. I intend to start implementation very
>>> soon, as this is an important component of the HA story for TripleO:
>>>
>>> https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates
>>
>> Hi Clint, thanks for pushing this.
>>
>> First, I don't think RollingUpdatePattern and CanaryUpdatePattern should be 2 different entities. The second just looks like a parametrization of the first (growth_factor=1?).
>
> Perhaps they can just be one. Until I find parameters which would need
> to mean something different, I'll just use UpdatePattern.
>
>>
>> I then feel that using (abusing?) depends_on for update pattern is a bit weird. Maybe I'm influenced by the CFN design, but the separate UpdatePolicy attribute feels better (although I would probably use a property). I guess my main question is around the meaning of using the update pattern on a server instance. I think I see what you want to do for the group, where child_updating would return a number, but I have no idea what it means for a single resource. Could you detail the operation a bit more in the document?
>>
>
> I would be o-k with adding another keyword. The idea in abusing depends_on
> is that it changes the core language less. Properties is definitely out
> for the reasons Christopher brought up, properties is really meant to
> be for the resource's end target only.

Agree, -1 for properties - those belong to the resource, and this data 
belongs to Heat.

> UpdatePolicy in cfn is a single string, and causes very generic rolling

Huh?

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html

Not only is it not just a single string (in fact, it looks a lot like 
the properties you have defined), it's even got another layer of 
indirection so you can define different types of update policy (rolling 
vs. canary, anybody?). It's an extremely flexible syntax.

BTW, given that we already implemented this in autoscaling, it might be 
helpful to talk more specifically about what we need to do in addition 
in order to support the use cases you have in mind.

> update behavior. I want this resource to be able to control multiple
> groups as if they are one in some cases (Such as a case where a user
> has migrated part of an app to a new type of server, but not all.. so
> they will want to treat the entire aggregate as one rolling update).
>
> I'm o-k with overloading it to allow resource references, but I'd like
> to hear more people take issue with depends_on before I select that
> course.

Resource references in general, and depends_on in particular, feel like 
very much the wrong abstraction to me. This is a policy, not a resource.

> To answer your question, using it with a server instance allows
> rolling updates across non-grouped resources. In the example the
> rolling_update_dbs does this.

That's not a great example, because one DB server depends on the other, 
forcing them into updating serially anyway.

I have to say that even in general, this whole idea about applying 
update policies to non-grouped resources doesn't make a whole lot of 
sense to me. For non-grouped resources you control the resource 
definitions individually - if you don't want them to update at a 
particular time, you have the option of just not updating them.

Where you _do_ need it is for scaling groups where every server is based 
on the same launch config, so you need a way to control the members 
individually - by batching up operations (done), adding delays (done) 
or, even better, notifications and callbacks.

So it seems like doing 'rolling' updates for any random subset of 
resources is effectively turning Heat into something of a poor-man's 
workflow service, and IMHO that is probably a mistake.

What we do need for all resources (not just scaling groups) is a way for 
the user to say "for this particular resource, notify me when it has 
updated (but, if possible, before we have taken any destructive actions 
on it), give me a chance to test it and accept or reject the update". 
For example, when you resize a server, give the user a chance to confirm 
or reject the change at the VERIFY_RESIZE step (Trove requires this). Or 
when you replace a server during an update, give the user a chance to 
test the new server and either keep it (continue on and delete the old 
one) or not (roll back). Or when you replace a server in a scaling 
group, notify the load balancer _or some other thing_ (e.g. OpenShift 
broker node) that a replacement has been created and wait for it to 
switch over to the new one before deleting the old one. Or, of course, 
when you update a server to some new config, give the user a chance to 
test it out and make sure it works before continuing with the stack 
update. All of these use cases can, I think, be solved with a single 
feature.

The open questions for me are:
1) How do we notify the user that it's time to check on a resource? 
(Marconi?)
2) How does the user ack/nack? (You're suggesting reusing WaitCondition, 
and that makes sense to me.)
3) How do we break up the operations so the notification occurs at the 
right time? (With difficulty, but it should be do-able.)
4) How does the user indicate for which resources they want to be 
notified? (Inside an update_policy? Another new directive at the 
type/properties/depends_on/update_policy level?)

>> It also seems that the interface you're creating (child_creating/child_updating) is fairly specific to your use case. For autoscaling we have a need for more generic notification system, it would be nice to find common grounds. Maybe we can invert the relationship? Add a "notified_resources" attribute, which would call hooks on the "parent" when actions are happening.
>>
>
> I'm open to a different interface design. I don't really have a firm
> grasp of the generic behavior you'd like to model though. This is quite
> concrete and would be entirely hidden from template authors, though not
> from resource plugin authors. Attributes sound like something where you
> want the template authors to get involved in specifying, but maybe that
> was just an overloaded term.
>
> So perhaps we can replace this interface with the generic one when your
> use case is more clear?

I'm not sure about the implementation Thomas proposed, but I believe the 
use case he has in mind is the third of the four I listed above (replace 
a server in a scaling group).

cheers,
Zane.



More information about the OpenStack-dev mailing list