[openstack-dev] [TripleO][Heat] Overcloud software updates and ResourceGroups
Pavlo Shchelokovskyy
pshchelokovskyy at mirantis.com
Fri Apr 3 06:41:14 UTC 2015
Hi,
On Fri, Apr 3, 2015 at 1:31 AM, Zane Bitter <zbitter at redhat.com> wrote:
> A few of us have been looking for a way to perform software updates to
> servers in a TripleO Heat/Puppet-based overcloud that avoids an impedance
> mismatch with Heat concepts and how Heat runs its workflow. As many
> talented TripleO-ers who have gone before can probably testify, that's
> surprisingly difficult to do, but we did come up with an idea that I think
> might work and which I'd like to get wider feedback on. For clarity, I'm
> speaking here in the context of the new overcloud-without-mergepy templates.
>
> The idea is that we create a SoftwareConfig that, when run, can update
> some software on the server. (The exact mechanism for the update is not
> important for this discussion; suffice to say that in principle it could be
> as simple as "[yum|apt-get] update".) The SoftwareConfig would have at
> least one input, though it need not do anything with the value.
>
> Then each server has that config deployed to it with a SoftwareDeployment
> at the time it is created. However, it is set to execute only on the UPDATE
> action. The value of (one of) the input(s) is obtained from a parameter.
>
> As a result, we can trigger the software update by simply changing the
> value of the input parameter, and the regular Heat dependency graph will be
> respected. The actual input value could be by convention a uuid, a
> timestamp, a random string, or just about anything so long as it changes.
>
> Here's a trivial example of what this deployment might look like:
>
> update_config:
> type: OS::Heat::SoftwareConfig
> properties:
> config: {get_file: do_sw_update.sh}
> inputs:
> - name: update_after_time
> description: Timestamp of the most recent update request
>
> update_deployment:
> type: OS::Heat::SoftwareDeployment
> properties:
> actions:
> - UPDATE
> config: {get_resource: update_config}
> server: {get_resource: my_server}
> input_values:
> update_after_time: {get_param: update_timestamp}
>
>
> (A possible future enhancement is that if you keep a mapping between
> previous input values and the system state after the corresponding update,
> you could even automatically handle rollbacks in the event the user decided
> to cancel the update.)
>
> And now we should be able to trigger an update to all of our servers, in
> the regular Heat dependency order, by simply (thanks to the fact that
> parameters now keep their previous values on stack updates unless they're
> explicitly changed) running a command like:
>
> heat stack-update my_overcloud -f $TMPL -P "update_timestamp=$(date)"
>
> (A future goal of Heat is to make specifying the template again optional
> too... I don't think that change landed yet, but in this case we can always
> obtain the template from Tuskar, so it's not so bad.)
>
>
> Astute readers may have noticed that this does not actually solve our
> problem. In reality groups of similar servers are deployed within
> ResourceGroups and there are no dependencies between the members. So, for
> example, all of the controller nodes would be updated in parallel, with the
> likely result that the overcloud could be unavailable for some time even if
> it is deployed with HA.
>
> The good news is that a solution to this problem is already implemented in
> Heat: rolling updates. For example, the controller node availability
> problem can be solved by setting a rolling update batch size of 1. The bad
> news is that rolling updates are implemented only for AutoscalingGroups,
> not ResourceGroups.
>
It seems we should implement rolling_updates for ResourceGroup, and even
more, rolling_create too. Just a couple of days ago we had a chat with
Sahara's PTL, Sergey Lukjanov, and he was asking if there is a way to
create a number of resources in batches, but with a single call. Sahara
does not need autoscaling, so our idea was exactly that - rolling_create
and rolling_update for ResourceGroups should solve this. Thus we have one
more use case for that, I'm going to raise a spec (or bug?) soon.
> Accordingly, I propose that we switch the implementation of
> overcloud-without-mergepy from ResourceGroups to AutoscalingGroups. This
> would be a breaking change for overcloud updates (although no worse than
> the change from merge.py over to overcloud-without-mergepy), but that also
> means that there'll never be a better time than now to make it.
>
> I suspect that some folks (Tomas?) have possibly looked into this in the
> past... can anybody identify any potential obstacles to the change? Two
> candidates come to mind:
>
> 1) The SoftwareDeployments (plural) resource type. I believe we carefully
> designed that to work with both ResourceGroup and AutoscalingGroup though.
> 2) The elision feature (https://review.openstack.org/#/c/128365/). Steve,
> I think this was only implemented for ResourceGroup? An AutoscalingGroup
> version of this should be feasible though, or do we have better ideas for
> how to solve it in that context?
>
> cheers,
> Zane.
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
Best regards,
Pavlo Shchelokovskyy
Software Engineer
Mirantis Inc
www.mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150403/1064fe04/attachment.html>
More information about the OpenStack-dev
mailing list