[openstack-dev] [tripleo] TripleO Updates & Upgrades [was: Upgrade plans for RDO Manager - Brainstorming]

Steven Hardy shardy at redhat.com
Mon Sep 14 21:42:00 UTC 2015


Firstly thanks Emilien for starting this discussion, I revised the subject
in an effort to get wider feedback, apologies for my delay responding;

On Wed, Sep 09, 2015 at 11:34:26AM -0400, Zane Bitter wrote:
> On 24/08/15 15:12, Emilien Macchi wrote:
> >Hi,
> >
> >So I've been working on OpenStack deployments for 4 years now and so far
> >RDO Manager is the second installer -after SpinalStack [1]- I'm working on.
> >
> >SpinalStack already had interested features [2] that allowed us to
> >upgrade our customer platforms almost every months, with full testing
> >and automation.
> >
> >Now, we have RDO Manager, I would be happy to share my little experience
> >on the topic and help to make it possible in the next cycle.
> >
> >For that, I created an etherpad [3], which is not too long and focused
> >on basic topics for now. This is technical and focused on Infrastructure
> >upgrade automation.
> >
> >Feel free to continue discussion on this thread or directly in the etherpad.
> >
> >[1] http://spinalstack.enovance.com
> >[2] http://spinalstack.enovance.com/en/latest/dev/upgrade.html
> >[3] https://etherpad.openstack.org/p/rdo-manager-upgrades
> 
> I added some notes on the etherpad, but I think this discussion poses a
> larger question: what is TripleO? Why are we using Heat? Because to me the
> major benefit of Heat is that it maintains a record of the current state of
> the system that can be used to manage upgrades. And if we're not going to
> make use of that - if we're going to determine the state of the system by
> introspecting nodes and update it by using Ansible scripts without Heat's
> knowledge, then we probably shouldn't be using Heat at all.

So, I think we should definitely learn from successful implementations such
as SpinalStack's, but given the way TripleO is currently implemented (e.g
primarily orchestrating software configuration via Heat), and the philosophy
behind the project I think it would be good to focus mostly on *what* needs
to be done and not too much on *how* in terms of tooling at this point, and
definitely not to assume any up-front requirement for additional CM
tooling.

The massive part of the value of TripleO IMHO is using OpenStack native
tooling whenever possible (even if it means working to improve the tools
for all users/use-cases), and I do think (just like orchestrating the
initial deployment) this *is* possible via Heat SoftwareDeployments, but
there's also an external workflow component, which is likely to be
satisfied via tripleo-common (short term) and probably Mistral (longer
term).

> I'm not saying that to close off the option - I think if Heat is not the
> best tool for the job then we should definitely consider other options. And
> right now it really is not the best tool for the job. Adopting Puppet (which
> was a necessary choice IMO) has meant that the responsibility for what I
> call "software orchestration"[1] is split awkwardly between Puppet and Heat.
> For example, the Puppet manifests are baked in to images on the servers, so
> Heat doesn't know when they've changed and can't retrigger Puppet to update
> the configuration when they do. We're left trying to reverse-engineer what
> is supposed to be a declarative model from the workflow that we want for
> things like updates/upgrades.

I don't really agree with this at all tbh - the puppet *modules* are by
default distributed in the images, but any update to them is deployed via
either an RPM update (which heat detects, provided it's applied via the
OS::TripleO::Tasks::PackageUpdate[1] interface, thus puppet *can* be correctly
reapplied), or potentially via rsync[2] in future, a unique
identifier is all that's required to wire in puppet getting reapplied via
NodeConfigIdentifiers[3]

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-resource-registry-puppet.yaml#L24
[2] https://github.com/openstack/tripleo-heat-templates/blob/master/firstboot/userdata_dev_rsync.yaml
[3] https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-without-mergepy.yaml#L1262

The puppet *manifests* are distributed via heat, so any update to those
will trigger heat to reapply the manifest the same as any change to a
SoftwareConfig resource config definition.

I actually think we've ended up with a pretty clear split in responsibility
between puppet and Heat, Heat does the orchestration and puts data in place
for to be consumed by puppet, which then owns all aspects of the software
configuration.

> That said, I think there's still some cause for optimism: in a world where
> every service is deployed in a container and every container has its own
> Heat SoftwareDeployment, the boundary between Heat's responsibilities and
> Puppet's would be much clearer. The deployment could conceivably fit a
> declarative model much better, and even offer a lot of flexibility in which
> services run on which nodes. We won't really know until we try, but it seems
> distinctly possible to aspire toward Heat actually making things easier
> rather than just not making them too much harder. And there is stuff on the
> long-term roadmap that could be really great if only we had time to devote
> to it - for example, as I mentioned in the etherpad, I'd love to get Heat's
> user hooks integrated with Mistral so that we could have fully-automated,
> highly-available (in a hypothetical future HA undercloud) live migration of
> workloads off compute nodes during updates.

Yup, definitely as we move closer towards more granular role definitions
and particularly container integration, I think the value of the heat
declarative model, composability, and built-in integration with other
OpenStack services will provide more obvious benefits vs tools geared
solely towards software configuration.

> In the meantime, however, I do think that we have all the tools in Heat that
> we need to cobble together what we need to do. In Liberty, Heat supports
> batched rolling updates of ResourceGroups, so we won't need to use user
> hooks to cobble together poor-man's batched update support any more. We can
> use the user hooks for their intended purpose of notifying the client when
> to live-migrate compute workloads off a server that is about to upgraded.
> The Heat templates should already tell us exactly which services are running
> on which nodes. We can trigger particular software deployments on a stack
> update with a parameter value change (as we already do with the yum update
> deployment). For operations that happen in isolation on a single server, we
> can model them as SoftwareDeployment resources within the individual server
> templates. For operations that are synchronised across a group of servers
> (e.g. disabling services on the controller nodes in preparation for a DB
> migration) we can model them as a SoftwareDeploymentGroup resource in the
> parent template. And for chaining multiple sequential operations (e.g.
> disable services, migrate database, enable services), we can chain outputs
> to inputs to handle both ordering and triggering. I'm sure there will be
> many subtleties, but I don't think we *need* Ansible in the mix.

+1 - While I get that Ansible is a popular tool, given the current TripleO
implementation I don't think it's *needed* to orchestrate updates or
upgrades, and there are advantages to keeping the state associated with
cluster-wide operations inside Heat.

I know from talking with Emilien that one aspect of SpinalStack's update
workflow we don't currently capture is the step of determining what is
about to be updated, then calculating a workflow associated with e.g
restarting services in the right order.  It'd be interesting to figure out
how that might be wired in via the current Heat model and maybe prototype
something which mimics what was done by SpinalStack via Ansible.

> So it's really up to the wider TripleO project team to decide which path to
> go down. I am genuinely not bothered whether we choose Heat or Ansible.
> There may even be ways they can work together without compromising either
> model. But I would be pretty uncomfortable with a mix where we use Heat for
> deployment and Ansible for doing upgrades behind Heat's back.

Perhaps it'd be helpful to work up a couple of specs (or just one which
covers both) defining;

1. Strategy for Updates (defined as all incremental updates *not* requiring
any changes to DB schema or RPC version, e.g consuming stable-branch
updates)

2. How we deal with (and test) Upgrades (e.g moving from Kilo to Liberty,
where there are requirements to do DB schema and RPC version changes, and
not all services yet support the more advanced models implemented by e.g
Nova yet)

Cheers,

Steve



More information about the OpenStack-dev mailing list