[Openstack-operators] [openstack-dev] [stable][all] Keeping Juno "alive" for longer.

Clayton O'Neill clayton at oneill.net
Mon Nov 9 21:20:35 UTC 2015

I do think it’s gotten better in some ways and I expect it will continue to
get better.  The Nova team’s work for live upgrades is definitely the way
we’d like to see everything going.

However, I think a lot of us have had to fight with OVS agent restarts
during upgrades (I know it’s fixed in Liberty).  Being able to keep the
control plane up during upgrades is definitely in the “nice to have”
category until we get no impact data plane upgrades in place with Liberty.
With our Kilo upgrade we went to ridiculous lengths to avoid data plane
outages and *mostly* succeeded.

For us, a ton of the work is the management headache that occurs in
upgrading Puppet modules, finding deprecations, finding the places the
Puppet modules haven’t been fixed for the new release, etc.  It’s not that
it’s a single big issue, it’s the death of a thousand cuts.

A some of our issues with upgrading to Kilo actually turned out to just be
poor quality release notes.  Specifically, we were bitten several times by
deprecated options being removed without being mentioned in the release
notes.  That’s arguably our fault, we should have taken care of that before
our upgrade, but there isn’t any excuse for it not being in the release
notes either.  I’m hopeful that’s going to be less of an issue with Mitaka,
since Reno should help address that.

I think we were probably one of the earlier upgraders for Juno and Kilo
(2-3 months after release).  For Kilo, we just hit a bunch of bugs.  I
don’t have any specific feedback on how to address that.

We started working on our Kilo upgrade the week of the Vancouver summit and
aside from when we were blocked, we had 1-3 people on that task full time
until we did our prod upgraded in August.  To be fair, we were blocked on
external or internal issues probably about half that time if I had to guess.

If you want more details on how we do upgrades or the Kilo issues we ran
into, the slides from our Tokyo upgrade talk are online here -

On Mon, Nov 9, 2015 at 4:05 PM, Sean Dague <sean at dague.net> wrote:

> On 11/09/2015 03:49 PM, Maish Saidel-Keesing wrote:
> > On 11/09/15 22:06, Tom Cameron wrote:
> >>> I would not call that the extreme minority.
> >>> I would say a good percentage of users are on only getting to Juno now.
> >> The survey seems to indicate lots of people are on Havana, Icehouse
> >> and Juno in production. I would love to see the survey ask _why_
> >> people are on older versions because for many operators I suspect they
> >> forked when they needed a feature or function that didn't yet exist,
> >> and they're now stuck in a horrible parallel universe where upstream
> >> has not only added the missing feature but has also massively improved
> >> code quality. Meanwhile, they can't spend the person hours on either
> >> porting their work into the new Big Tent world we live in, or can't
> >> bare the thought of having to throw away their hard earned tech debt.
> >> For more on this, see the myth of the "sunken cost".
> >>
> >> If it turns out people really are deploying new clouds with old
> >> versions on purpose because of a perceived stability benefit, then
> >> they aren't reading the release schedule pages close enough to see
> >> that what they're deploying today will be abandoned soon in the
> >> future. In my _personal_ opinion which has nothing to do with
> >> Openstack or my employer, this is really poor operational due diligence.
> > I don't think people are deploying old clouds or old versions.
> > They are just stuck on older versions. Why (as matt said in his reply)
> > the upgrade process is hell! And when your environment grows past a
> > certain point if you have have to upgrade say 100 hosts, it can take a
> > good couple of months to get the quirks fixed and sorted out, and then
> > you have to start all over again, because the next release just came out.
> Can you be more specific about "upgrade process is hell!"? We continue
> to work on improvements in upgrade testing to block patches that will
> make life hell for upgrading. Getting a bunch of specifics on bugs that
> triggered during upgrade by anyone doing it would go a long way in
> helping us figure out what's the next soft spot to tackle there.
> But without that data coming back in specifics it's hard to close
> whatever gap is here, real or perceived.
>         -Sean
> --
> Sean Dague
> http://dague.net
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20151109/6efbcf7a/attachment.html>

More information about the OpenStack-operators mailing list