[openstack-dev] [upgrades][skip-level][leapfrog] - RFC - Skipping releases when upgrading

Marios Andreou mandreou at redhat.com
Tue Jun 6 16:25:03 UTC 2017

On Fri, May 26, 2017 at 4:55 AM, Carter, Kevin <kevin at cloudnull.com> wrote:
> Hello Stackers,
Hi Kevin, all,

apologies for the very late response here - fwiw I was working at a remote
location all of last week and am catching up still. I was not at the PTG or
part of the original conversation but this thread && etherpad have been
very helpful so thank you very much for sharing. Mostly replying to say
'this is something TripleO/upgrades are interested in too' - obviously not
for the P cycle - and some thoughts on how TripleO is doing upgrades today.

Big +1 to David Simard's point about 'Making N to N+1 upgrades seamless and
work well is already challenging
today ' - ++ to that from our experience. Besides anything else, going
between versions we've also had to change the workflow itself (docs @ [0]
include a link to the composable services spec that explains why the
workflow had to change for Newton to Ocata upgrades). The point is we are
very much still working towards a seamless upgrades experience - we *are*
improving on each release most notably N..O - considering more pre-upgrade
validations for example and trying to minimize service downtime. Having
said that some more comments inline to the goal of skipping upgrades:

> As I'm sure many of you know there was a talk about doing "skip-level"[0]
> upgrades at the OpenStack Summit which quite a few folks were interested
> in. Today many of the interested parties got together and talked about
> doing more of this in a formalized capacity. Essentially we're looking for
> cloud upgrades with the possibility of skipping releases, ideally enabling
> an N+3 upgrade. In our opinion it would go a very long way to solving cloud
> consumer and deployer problems it folks didn't have to deal with an upgrade
> every six months. While we talked about various issues and some of the
> current approaches being kicked around we wanted to field our general chat
> to the rest of the community and request input from folks that may have
> already fought such a beast. If you've taken on an adventure like this how
> did you approach it? Did it work? Any known issues, gotchas, or things
> folks should be generally aware of?
> During our chat today we generally landed on an in-place upgrade with
> known API service downtime and little (at least as little as possible) data
> plane downtime. The process discussed was basically:
> a1. Create utility "thing-a-me" (container, venv, etc) which contains the
> required code to run a service through all of the required upgrades.
> a2. Stop service(s).
> a3. Run migration(s)/upgrade(s) for all releases using the utility
> "thing-a-me".
> a4. Repeat for all services.
> b1. Once all required migrations are complete run a deployment using the
> target release.
> b2. Ensure all services are restarted.
> b3. Ensure cloud is functional.
> b4. profit!
> Obviously, there's a lot of hand waving here but such a process is being
> developed by the OpenStack-Ansible project[1]. Currently, the OSA tooling
> will allow deployers to upgrade from Juno/Kilo to Newton using Ubuntu
> 14.04. While this has worked in the lab, it's early in development (YMMV).
> Also, the tooling is not very general purpose or portable outside of OSA
> but it could serve as a guide or just a general talking point. Are there
> other tools out there that solve for the multi-release upgrade? Are there
> any folks that might want to share their expertise? Maybe a process outline
> that worked? Best practices? Do folks believe tools are the right way to
> solve this or would comprehensive upgrade documentation be better for the
> general community?
What about packages - what repos will we set up on these nodes ... will
they jump directly from current version to latest of target e.g. N+2? Is
that possible - I mean we may have to consider any version specific
packaging tasks. In TripleO we are actually using ansible tasks defined per
service manifest e.g. neutron l3 agent @ [1] to stop all the things and
then we rely on puppet (puppet-tripleo and service specific puppet modules)
to update packages, run dbase migrations e.g. [2] and start all the things
again (the exception to this general rule of ansible down/puppet up is some
core services, which we want to recover immediately rather than wait for
puppet run, like at [3] for example rabbit).

I am not by any stretch expert on the dbase migrations so I leave that
discussion to more qualified folks but just from a general scaling point of
view trying to maintain a single repo for all the migration things for all
services doesn't work so +1 to the others here advocating the migrations
live with the service and should be compiled/applied by tooling at run time
- whether it is a container thing-a-me or puppet/whatever. For TripleO you
could even override the puppet PostDeploy steps and run Ansible tasks
instead if that accomplished what you needed for the upgrades in your
service list. In fact the TripleO Ocata to Pike upgrade overrides those to
run docker instead of puppet (puppet is still invoked however) to bring up
your services in containers.

Besides the obviously crucial migrations there are other issues to
consider. We've had to deal with changes to services themselves,
deprecations for example removing foo-api.service and using apache for that
service instead of eventlet. And then special case bugs like openvswitch -
we had to special case ovs 2.4->2.5 for M..N  and 2.5->2.6 for N..O to
prevent it from restarting during - and killing - the upgrade).  In today's
workflow we would essentially need to combine these into one 'invocation
'of the upgrade but I really have not thought about that in any detail.

thanks for reading, marios


> As most of the upgrade issues center around database migrations, we
> discussed some of the potential pitfalls at length. One approach was to
> roll-up all DB migrations into a single repository and run all upgrades for
> a given project in one step. Another was to simply have mutliple python
> virtual environments and just run in-line migrations from a version
> specific venv (this is what the OSA tooling does). Does one way work better
> than the other? Any thoughts on how this could be better? Would having
> N+2/3 migrations addressable within the projects, even if they're not
> tested any longer, be helpful?
> It was our general thought that folks would be interested in having the
> ability to skip releases so we'd like to hear from the community to
> validate our thinking. Additionally, we'd like to get more minds together
> and see if folks are wanting to work on such an initiative, even if this
> turns into nothing more than a co-op/channel where we can "phone a friend".
> Would it be good to try and secure some PTG space to work on this? Should
> we try and create working group going?

If you've made it this far, please forgive my stream of consciousness. I'm
> trying to ask a lot of questions and distill long form conversation(s) into
> as little text as possible all without writing a novel. With that said, I
> hope this finds you well, I look forward to hearing from (and working with)
> you soon.
> [0] https://etherpad.openstack.org/p/BOS-forum-skip-level-upgrading
> [1] https://github.com/openstack/openstack-ansible-ops/tree/mast
> er/leap-upgrades
> --
> Kevin Carter
> IRC: Cloudnull
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170606/a57354ea/attachment.html>

More information about the OpenStack-dev mailing list