On Fri, Nov 5, 2021 at 2:39 PM Thierry Carrez <thierry@openstack.org> wrote:
Hi everyone,

The (long) document below reflects the current position of the release
management team on a popular question: should the OpenStack release
cadence be changed? Please note that we only address the release
management / stable branch management facet of the problem. There are
other dimensions to take into account (governance, feature deprecation,
supported distros...) to get a complete view of the debate.

Introduction
------------

The subject of how often OpenStack should be released has been regularly
debated in the OpenStack community. OpenStack started with a 3-month
release cycle, then switched to 6-month release cycle starting with
Diablo. It is often thought of a release management decision, but it is
actually a much larger topic: a release cadence is a trade-off between
pressure to release more often and pressure to release less often,
coming in from a lot of different stakeholders. In OpenStack, it is
ultimately a Technical Committee decision. But that decision is informed
by the position of a number of stakeholders. This document gives
historical context and describes the current release management team
position.

The current trade-off
---------------------

The main pressure to release more often is to make features available to
users faster. Developers get a faster feedback loop, hardware vendors
ensure software is compatible with their latest products, and users get
exciting new features. "Release early, release often" is a best practice
in our industry -- we should generally aim at releasing as often as
possible.

But that is counterbalanced by pressure to release less often. From a
development perspective, each release cycle comes with some process
overhead. On the integrators side, a new release means packaging and
validation work. On the users side, it means pressure to upgrade. To
justify that cost, there needs to be enough user-visible benefit (like
new features) in a given release.

For the last 10 years for OpenStack, that balance has been around six
months. Six months let us accumulate enough new development that it was
worth upgrading to / integrating the new version, while giving enough
time to actually do the work. It also aligned well with Foundation
events cadence, allowing to synchronize in-person developer meetings
date with start of cycles.

For sure I'm not talking on behalf of every project (or I might be, but I just don't know the dynamics across well enough). Anyways I think this assessment is missing one critical point, which is a release being the break off point for bikeshedding. I see a lot of genuine urgency to finally make a decision that has been going back and forth from the early cycle and the difference of opinion being finally solved when we're hitting the feature freeze / RC time. This does not only apply to feature work but lots of bugs as well that do not have active community members pushing them to get fixed (and backported) early.

That push before RC is tagged is and seems to have been steadily intense few weeks before release, it's taxing for sure but it also ensures that we do get things done or make a real active decision to push it down the line for at least another half a year. I think the biggest actual drawback of longer release cycle is to lose this checkpoint (lets be honest here, no-one cares about milestones). Not that longer release would only make those hard decisions of "Are we including the work in this release or not" be even rarer occasion but the push before RC would intensify a lot when we have double the time of accumulating the review dept before we actually have to have that discussion. I think more of valuable contributions would be lost by losing traction (and people just deciding to carry the patches as one more downstream only thing as the community can't get around it).

What changed
------------

The major recent change affecting this trade-off is that the pace of new
development in OpenStack slowed down. The rhythm of changes was divided
by 3 between 2015 and 2021, reflecting that OpenStack is now a mature
and stable solution, where accessing the latest features is no longer a
major driver. That reduces some of the pressure for releasing more
often. At the same time, we have more users every day, with larger and
larger deployments, and keeping those clusters constantly up to date is
an operational challenge. That increases the pressure to release less
often. In essence, OpenStack is becoming much more like a LTS
distribution than a web browser -- something users like moving slow.

Over the past years, project teams also increasingly decoupled
individual components from the "coordinated release". More and more
components opted for an independent or intermediary-released model,
where they can put out releases in the middle of a cycle, making new
features available to their users. This increasingly opens up the
possibility of a longer "coordinated release" which would still allow
development teams to follow "release early, release often" best
practices. All that recent evolution means it is (again) time to
reconsider if the 6-month cadence is what serves our community best, and
in particular if a longer release cadence would not suit us better.

The release management team position on the debate
--------------------------------------------------

While releasing less often would definitely reduce the load on the
release management team, most of the team work being automated, we do
not think it should be a major factor in motivating the decision. We
should not adjust the cadence too often though, as there is a one-time
cost in switching our processes. In terms of impact, we expect that a
switch to a longer cycle will encourage more project teams to adopt a
"with-intermediary" release model (rather than the traditional "with-rc"
single release per cycle), which may lead to abandoning the latter,
hence simplifying our processes. Longer cycles might also discourage
people to commit to PTL or release liaison work. We'd probably need to
manage expectations there, and encourage more frequent switches (or
create alternate models).

While most of the PTL positions have not been resolved in elections for a while, this is a great note to keep in mind. Fundamentally we should keep that process as the main means to select new PTLs and pushing more and more towards handovers without even room for debate might bite us one day.

If the decision is made to switch to a longer cycle, the release
management team recommends to switch to one year directly. That would
avoid changing it again anytime soon, and synchronizing on a calendar
year is much simpler to follow and communicate. We also recommend
announcing the change well in advance. We currently have an opportunity
of making the switch when we reach the end of the release naming
alphabet, which would also greatly simplify the communications around
the change.

Finally, it is worth mentioning the impact on the stable branch work.
Releasing less often would likely impact the number of stable branches
that we keep on maintaining, so that we do not go too much in the past
(and hit unmaintained distributions or long-gone dependencies). We
currently maintain releases for 18 months before they switch to extended
maintenance, which results in between 3 and 4 releases being maintained
at the same time. We'd recommend switching to maintaining one-year
releases for 24 months, which would result in between 2 and 3 releases
being maintained at the same time. Such a change would lead to longer
maintenance for our users while reducing backporting work for our
developers.

--
Thierry Carrez (ttx)
On behalf of the OpenStack Release Management team

In general my 2 cents for the proposals and couple of ideas maybe to consider easing the pain of current model:

I really think the slowing phase (not due to lack of work items) is real and the biggest concern I have for a longer cycle is that it would be driving us to lose even more momentum and valuable contributions (see my comment at the end of "current trade-off").
It would also increase a lot of the pressure to backport features (no matter how much we claim this not being the case, we've seen it very clearly first hand downstream when we moved our product cycle to cover multiple upstream development cycles).
Like Dan mentioned early on this thread the downstream releases from different sources won't align anyways and I think would contribute even more towards the drive of doing work downstream rather than upstream.

In general I hate the idea of LTS even more than a longer cycle as then you effectively need to maintain 2 forks and put a lot of pressure on future contributors to align them just so that you can have yet another forking point to start diverting again.

Perhaps a couple of things we could consider for easing the pain of upgrades over multiple cycles, the load of release itself and still balance the momentum of the "break point":
1) Compress the last few weeks of release with maybe aligning feature freeze with RC and bringing the final library releases closer as well. This might require us to have RC of clients to ease the integration pressure. This could result to few more RCs being tagged as things are being discovered later but them tags are fairly cheap.
2) Have any breaking changes (removing of deprecated anything) and disruptive db migrations happening only on cycle released in the second half of the year, while the first half would focus on bug fixes, non-intrusive feature work, etc.
3) move to max single community goal per year that targets to be finished on that second release (as they seem to be more and more disruptive in nature).

As a bonus point I'd like to see a call to action to cut the test loads we're generating. I think our gating across the projects has some serious redundancy on them (I still remember the panic when our average check and gate runs reached 1hr mark). It's been great to see the efforts of covering more with our testing and that is very important too, but I still think we're eating a lot of infra resources that could be freed up (specially easing the last weeks of any release point) without losing the quality of our testing.

This would give us the opportunity to have a real coordinated release as a checkpoint to get things done, but allow distributions and consumers to worry about the major upgrade pain of only one release per year. It would give us still 2 times a year to hype about the new release and all the positives coming with it and keep the planning of work, resources & commitments in more manageable chunks. Most importantly, apart from not allowing the breaking changes in the first release of the year we should treat both of them as 1st class citizens of releases, not "Development and LTS" or anything like that, just concentration of pain to the later one.

- Erno "jokke" Kuvaja