[openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC
Zane Bitter
zbitter at redhat.com
Wed Feb 5 00:14:09 UTC 2014
On 03/02/14 17:09, Clint Byrum wrote:
> Excerpts from Thomas Herve's message of 2014-02-03 12:46:05 -0800:
>>> So, I wrote the original rolling updates spec about a year ago, and the
>>> time has come to get serious about implementation. I went through it and
>>> basically rewrote the entire thing to reflect the knowledge I have
>>> gained from a year of working with Heat.
>>>
>>> Any and all comments are welcome. I intend to start implementation very
>>> soon, as this is an important component of the HA story for TripleO:
>>>
>>> https://wiki.openstack.org/wiki/Heat/Blueprints/RollingUpdates
>>
>> Hi Clint, thanks for pushing this.
>>
>> First, I don't think RollingUpdatePattern and CanaryUpdatePattern should be 2 different entities. The second just looks like a parametrization of the first (growth_factor=1?).
>
> Perhaps they can just be one. Until I find parameters which would need
> to mean something different, I'll just use UpdatePattern.
>
>>
>> I then feel that using (abusing?) depends_on for update pattern is a bit weird. Maybe I'm influenced by the CFN design, but the separate UpdatePolicy attribute feels better (although I would probably use a property). I guess my main question is around the meaning of using the update pattern on a server instance. I think I see what you want to do for the group, where child_updating would return a number, but I have no idea what it means for a single resource. Could you detail the operation a bit more in the document?
>>
>
> I would be o-k with adding another keyword. The idea in abusing depends_on
> is that it changes the core language less. Properties is definitely out
> for the reasons Christopher brought up, properties is really meant to
> be for the resource's end target only.
Agree, -1 for properties - those belong to the resource, and this data
belongs to Heat.
> UpdatePolicy in cfn is a single string, and causes very generic rolling
Huh?
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html
Not only is it not just a single string (in fact, it looks a lot like
the properties you have defined), it's even got another layer of
indirection so you can define different types of update policy (rolling
vs. canary, anybody?). It's an extremely flexible syntax.
BTW, given that we already implemented this in autoscaling, it might be
helpful to talk more specifically about what we need to do in addition
in order to support the use cases you have in mind.
> update behavior. I want this resource to be able to control multiple
> groups as if they are one in some cases (Such as a case where a user
> has migrated part of an app to a new type of server, but not all.. so
> they will want to treat the entire aggregate as one rolling update).
>
> I'm o-k with overloading it to allow resource references, but I'd like
> to hear more people take issue with depends_on before I select that
> course.
Resource references in general, and depends_on in particular, feel like
very much the wrong abstraction to me. This is a policy, not a resource.
> To answer your question, using it with a server instance allows
> rolling updates across non-grouped resources. In the example the
> rolling_update_dbs does this.
That's not a great example, because one DB server depends on the other,
forcing them into updating serially anyway.
I have to say that even in general, this whole idea about applying
update policies to non-grouped resources doesn't make a whole lot of
sense to me. For non-grouped resources you control the resource
definitions individually - if you don't want them to update at a
particular time, you have the option of just not updating them.
Where you _do_ need it is for scaling groups where every server is based
on the same launch config, so you need a way to control the members
individually - by batching up operations (done), adding delays (done)
or, even better, notifications and callbacks.
So it seems like doing 'rolling' updates for any random subset of
resources is effectively turning Heat into something of a poor-man's
workflow service, and IMHO that is probably a mistake.
What we do need for all resources (not just scaling groups) is a way for
the user to say "for this particular resource, notify me when it has
updated (but, if possible, before we have taken any destructive actions
on it), give me a chance to test it and accept or reject the update".
For example, when you resize a server, give the user a chance to confirm
or reject the change at the VERIFY_RESIZE step (Trove requires this). Or
when you replace a server during an update, give the user a chance to
test the new server and either keep it (continue on and delete the old
one) or not (roll back). Or when you replace a server in a scaling
group, notify the load balancer _or some other thing_ (e.g. OpenShift
broker node) that a replacement has been created and wait for it to
switch over to the new one before deleting the old one. Or, of course,
when you update a server to some new config, give the user a chance to
test it out and make sure it works before continuing with the stack
update. All of these use cases can, I think, be solved with a single
feature.
The open questions for me are:
1) How do we notify the user that it's time to check on a resource?
(Marconi?)
2) How does the user ack/nack? (You're suggesting reusing WaitCondition,
and that makes sense to me.)
3) How do we break up the operations so the notification occurs at the
right time? (With difficulty, but it should be do-able.)
4) How does the user indicate for which resources they want to be
notified? (Inside an update_policy? Another new directive at the
type/properties/depends_on/update_policy level?)
>> It also seems that the interface you're creating (child_creating/child_updating) is fairly specific to your use case. For autoscaling we have a need for more generic notification system, it would be nice to find common grounds. Maybe we can invert the relationship? Add a "notified_resources" attribute, which would call hooks on the "parent" when actions are happening.
>>
>
> I'm open to a different interface design. I don't really have a firm
> grasp of the generic behavior you'd like to model though. This is quite
> concrete and would be entirely hidden from template authors, though not
> from resource plugin authors. Attributes sound like something where you
> want the template authors to get involved in specifying, but maybe that
> was just an overloaded term.
>
> So perhaps we can replace this interface with the generic one when your
> use case is more clear?
I'm not sure about the implementation Thomas proposed, but I believe the
use case he has in mind is the third of the four I listed above (replace
a server in a scaling group).
cheers,
Zane.
More information about the OpenStack-dev
mailing list