[openstack-dev] [Heat] [TripleO] Rolling updates spec re-written. RFC

Christopher Armstrong chris.armstrong at rackspace.com
Wed Feb 5 01:04:45 UTC 2014


On Tue, Feb 4, 2014 at 6:14 PM, Zane Bitter <zbitter at redhat.com> wrote:

> On 03/02/14 17:09, Clint Byrum wrote:
>
>> Excerpts from Thomas Herve's message of 2014-02-03 12:46:05 -0800:
>>
>>>
>>>>
>  update behavior. I want this resource to be able to control multiple
>> groups as if they are one in some cases (Such as a case where a user
>> has migrated part of an app to a new type of server, but not all.. so
>> they will want to treat the entire aggregate as one rolling update).
>>
>> I'm o-k with overloading it to allow resource references, but I'd like
>> to hear more people take issue with depends_on before I select that
>> course.
>>
>
> Resource references in general, and depends_on in particular, feel like
> very much the wrong abstraction to me. This is a policy, not a resource.
>
>
>  To answer your question, using it with a server instance allows
>> rolling updates across non-grouped resources. In the example the
>> rolling_update_dbs does this.
>>
>
> That's not a great example, because one DB server depends on the other,
> forcing them into updating serially anyway.
>
> I have to say that even in general, this whole idea about applying update
> policies to non-grouped resources doesn't make a whole lot of sense to me.
> For non-grouped resources you control the resource definitions individually
> - if you don't want them to update at a particular time, you have the
> option of just not updating them.
>
> Where you _do_ need it is for scaling groups where every server is based
> on the same launch config, so you need a way to control the members
> individually - by batching up operations (done), adding delays (done) or,
> even better, notifications and callbacks.
>
> So it seems like doing 'rolling' updates for any random subset of
> resources is effectively turning Heat into something of a poor-man's
> workflow service, and IMHO that is probably a mistake.
>
> What we do need for all resources (not just scaling groups) is a way for
> the user to say "for this particular resource, notify me when it has
> updated (but, if possible, before we have taken any destructive actions on
> it), give me a chance to test it and accept or reject the update". For
> example, when you resize a server, give the user a chance to confirm or
> reject the change at the VERIFY_RESIZE step (Trove requires this). Or when
> you replace a server during an update, give the user a chance to test the
> new server and either keep it (continue on and delete the old one) or not
> (roll back). Or when you replace a server in a scaling group, notify the
> load balancer _or some other thing_ (e.g. OpenShift broker node) that a
> replacement has been created and wait for it to switch over to the new one
> before deleting the old one. Or, of course, when you update a server to
> some new config, give the user a chance to test it out and make sure it
> works before continuing with the stack update. All of these use cases can,
> I think, be solved with a single feature.
>
> The open questions for me are:
> 1) How do we notify the user that it's time to check on a resource?
> (Marconi?)
> 2) How does the user ack/nack? (You're suggesting reusing WaitCondition,
> and that makes sense to me.)
> 3) How do we break up the operations so the notification occurs at the
> right time? (With difficulty, but it should be do-able.)
> 4) How does the user indicate for which resources they want to be
> notified? (Inside an update_policy? Another new directive at the
> type/properties/depends_on/update_policy level?)
>
>
To relate this to another interesting feature, I think it would also be
super awesome if Heat grew the ability to support remotely-hosted resource
*types* (in addition to the resource notifications you're talking about) by
way of an API over Marconi (or maybe just a simple REST API that Heat would
invoke). I'm pretty sure CFN has something like this, too, using their
queue service. And I think their thing has the custom code ACK back over
the queue service to indicate that operations are complete, fwiw.



> we have a need for more generic notification system, it would be nice to
>>> find common grounds. Maybe we can invert the relationship? Add a
>>> "notified_resources" attribute, which would call hooks on the "parent" when
>>> actions are happening.
>>>
>>> It also seems that the interface you're creating (child_creating/child_updating)
>> is fairly specific to your use case. For autoscaling
>> I'm open to a different interface design. I don't really have a firm
>> grasp of the generic behavior you'd like to model though. This is quite
>> concrete and would be entirely hidden from template authors, though not
>> from resource plugin authors. Attributes sound like something where you
>> want the template authors to get involved in specifying, but maybe that
>> was just an overloaded term.
>>
>> So perhaps we can replace this interface with the generic one when your
>> use case is more clear?
>>
>
> I'm not sure about the implementation Thomas proposed, but I believe the
> use case he has in mind is the third of the four I listed above (replace a
> server in a scaling group).
>
>

I think another use case is temporarily removing a server from a load
balancer when it's being e.g. resized.


-- 
IRC: radix
http://twitter.com/radix
Christopher Armstrong
Rackspace
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140204/2dc33308/attachment.html>


More information about the OpenStack-dev mailing list