Open Stack

Mon Nov 8 19:43:18 UTC 2021

On Mon, Nov 8, 2021 at 10:44 AM Thierry Carrez <thierry at openstack.org> wrote:
>
> Ghanshyam Mann wrote:
> > [...]
> > Thanks Thierry for the detailed write up.
> >
> > At the same time, a shorter release which leads to upgrade-often pressure but
> > it will have fewer number of changes/features, so make the upgrade easy and
> > longer-release model will have more changes/features that will make upgrade more
> > complex.
>
> I think that was true a few years ago, but I'm not convinced that still
> holds. We currently have a third of the changes volume we had back in
> 2015, so a one-year release in 2022 would contain far less changes than
> a 6-month release from 2015.

I concur. Also, in 2015, we were still very much in a "move fast" mode
of operation as a community.

> Also, thanks to our testing and our focus on stability, the pain linked
> to the amount of breaking changes in a release is now negligible
> compared to the basic pain of going through a 1M-core deployment and
> upgrading the various pieces... every 6 months. I've heard of multiple
> users claiming it takes them close to 6 months to upgrade their massive
> deployments to a new version. So when they are done, they have to start
> again.
>
> --
> Thierry Carrez (ttx)
>

I've been hearing the exact same messaging from larger operators as
well as operators in environments where they are concerned about
managing risk for at least the past two years. These operators have
indicated it is not uncommon for the upgrade projects which consume,
test, certify for production, and deploy to production take *at least*
six months to execute. At the same time, they are shy of being the
ones to also "find all of the bugs", and so the project doesn't
actually start until well after the new coordinated release has
occurred. Quickly they become yet another version behind with this
pattern.

I suspect it is really easy for us as a CI focused community to think
that six months is plenty of time to roll out a fully updated
deployment which has been fully tested in every possible way. Except,
these operators are often trying to do just that on physical hardware,
with updated firmware and operatings systems bringing in new variables
with every single change which may ripple up the entire stack. These
operators then have to apply the lessons they have previously learned
once they have worked through all of the variables. In some cases this
may involve aspects such as benchmarking, to ensure they don't need to
make additional changes which need to be factored into their
deployment, sending them back to the start of their testing. All while
thinking of phrases like "business/mission critical".

I guess this means I'm in support of revising the release cycle. At
the same time, I think it would be wise for us to see if we can learn
from these operators the pain points they experience, the process they
leverage, and ultimately see if there are opportunities to spread
knowledge or potentially tooling. Or maybe even get them to contribute
their patches upstream. Not that all of these issues are easily solved
with any level of code, but sometimes they can include contextual
disconnects and resolving those are just as important as shipping a
release, IMHO.

-Julia

Open Stack

[all][tc] Relmgt team position on release cadence

OpenStack

Community

Documentation

Branding & Legal