[openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

Zane Bitter zbitter at redhat.com
Wed Feb 5 18:24:33 UTC 2014


On 05/02/14 11:39, Clint Byrum wrote:
> Excerpts from Zane Bitter's message of 2014-02-04 16:14:09 -0800:
>> On 03/02/14 17:09, Clint Byrum wrote:
>>> UpdatePolicy in cfn is a single string, and causes very generic rolling
>>
>> Huh?
>>
>> http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html
>>
>> Not only is it not just a single string (in fact, it looks a lot like
>> the properties you have defined), it's even got another layer of
>> indirection so you can define different types of update policy (rolling
>> vs. canary, anybody?). It's an extremely flexible syntax.
>>
>
> Oops, I relied a little too much on my memory and not enough on docs for
> that one. O-k, I will re-evaluate given actual knowledge of how it
> actually works. :-P

cheers :D

>> BTW, given that we already implemented this in autoscaling, it might be
>> helpful to talk more specifically about what we need to do in addition
>> in order to support the use cases you have in mind.
>>
>
> As Robert mentioned in his mail, autoscaling groups won't allow us to
> inject individual credentials. With the ResourceGroup, we can make a
> nested stack with a random string generator so that is solved. Now the

\o/ for the random string generator solving the problem!

:-( for ResourceGroup being the only way to do it.

This is exactly why I hate ResourceGroup and think it was a mistake. 
Powerful software comes from being able to combine simple concepts in 
complex ways. Right now you have to choose between an autoscaling group, 
which has rolling updates, and a ResourceGroup which allows you to scale 
stacks. That sucks. What you need is to have both at the same time, and 
the way to do that is to allow autoscaling groups to scale stacks, as 
has long been planned.

At this point it would be a mistake to add a _complicated_ feature 
solely for the purpose of working around the fact the we can't yet 
combine two other, existing, features. It would be better to fix 
autoscaling groups to allow you to inject individual credentials and 
then add a simpler feature that does not need to create ad-hoc groups.

> other piece we need is to be able to directly choose machines to take
> out of commission, which I think we may have a simple solution to but I
> don't want to derail on that.
>
> The one used in AutoScalingGroups is also limited to just one group,
> thus it can be done all inside the resource.
>
>>> update behavior. I want this resource to be able to control multiple
>>> groups as if they are one in some cases (Such as a case where a user
>>> has migrated part of an app to a new type of server, but not all.. so
>>> they will want to treat the entire aggregate as one rolling update).
>>>
>>> I'm o-k with overloading it to allow resource references, but I'd like
>>> to hear more people take issue with depends_on before I select that
>>> course.
>>
>> Resource references in general, and depends_on in particular, feel like
>> very much the wrong abstraction to me. This is a policy, not a resource.
>>
>>> To answer your question, using it with a server instance allows
>>> rolling updates across non-grouped resources. In the example the
>>> rolling_update_dbs does this.
>>
>> That's not a great example, because one DB server depends on the other,
>> forcing them into updating serially anyway.
>>
>
> You're right, a better example is a set of (n) resource groups which
> serve the same service and thus we want to make sure we maintain the
> minimum service levels as a whole.

That's interesting, and I'd like to hear more about that use case and 
why it couldn't be solved using autoscaling groups assuming the obstacle 
to using them at all were eliminated. If there's a real use case here 
beyond "work around lack of stack-scaling functionality" then I'm 
definitely open to being persuaded. I'd just like to make sure that it 
exists and justifies the extra complexity.

> If it were an order of magnitude harder to do it this way, I'd say
> sure let's just expand on the single-resource rolling update. But
> I think it won't be that much harder to achieve this and then the use
> case is solved.

I guess what I'm thinking is that your proposal is really two features:

1) Notifications/callbacks on update that allow the user to hook in to 
the workflow.
2) Rolling updates over ad-hoc groups (not autoscaling groups).

I think we all agree that (1) is needed; by my count ~6 really good use 
cases have been mentioned in this thread.

What I'm suggesting is that we probably don't need to do (2) at all if 
we fix autoscaling groups to be something you could use.

Having reviewed the code for rolling updates in scaling groups, I can 
report that it is painfully complicated and that you'd be doing yourself 
a big favour by not attempting to reimplement it with ad-hoc groups ;). 
(To be fair, I don't think this would be quite as bad, though clearly it 
wouldn't be as good as not having to do it at all.) More concerning than 
that, though, is the way this looks set to make the template format even 
more arcane than it already is. We might eventually be able to deprecate 
resource types like ResourceGroup but we will be stuck with stuff like 
this approximately forever, so we better make sure it contains only what 
we need for the long term and isn't substantially shaped by tactical 
workarounds for temporary problems.

>> I have to say that even in general, this whole idea about applying
>> update policies to non-grouped resources doesn't make a whole lot of
>> sense to me. For non-grouped resources you control the resource
>> definitions individually - if you don't want them to update at a
>> particular time, you have the option of just not updating them.

(Clarification: at the time I wrote this I wasn't aware that TripleO was 
unable to use autoscaling groups in their current form, and the example 
on the wiki contained only two servers, not 10+.)

> If I have to calculate all the deltas and feed Heat 10 templates, each
> with one small delta, I'm writing the same code as I'm proposing for
> this rolling update feature, but I'm writing it outside of Heat. That
> seems counter-productive for all of the other Heat users who would find
> this useful.

That's true. But as I mentioned in my reply to Robert, you already 
started reimplementing autoscaling functionality when you had to 
generate your own templates with multiple nearly-identical servers. If 
the choice is between pushing more functionality (i.e. stack-scaling) 
into autoscaling so that it actually works for you, or pushing 
autoscaling functionality (i.e. rolling-update) out to ad-hoc groups, 
then I'd submit that the former is better for Heat, for TripleO, and for 
all of the other Heat users as well, because then nobody has to 
implement _any_ part of autoscaling outside of Heat.

To be clear, that's a big 'if' and there may be another use case that I 
am missing, but I think it's worthwhile to have the discussion.

cheers,
Zane.



More information about the OpenStack-dev mailing list