[all][tc] Relmgt team position on release cadence

Erno Kuvaja ekuvaja at redhat.com
Wed Dec 8 13:50:46 UTC 2021


On Fri, Nov 5, 2021 at 2:39 PM Thierry Carrez <thierry at openstack.org> wrote:

> Hi everyone,
>
> The (long) document below reflects the current position of the release
> management team on a popular question: should the OpenStack release
> cadence be changed? Please note that we only address the release
> management / stable branch management facet of the problem. There are
> other dimensions to take into account (governance, feature deprecation,
> supported distros...) to get a complete view of the debate.
>
> Introduction
> ------------
>
> The subject of how often OpenStack should be released has been regularly
> debated in the OpenStack community. OpenStack started with a 3-month
> release cycle, then switched to 6-month release cycle starting with
> Diablo. It is often thought of a release management decision, but it is
> actually a much larger topic: a release cadence is a trade-off between
> pressure to release more often and pressure to release less often,
> coming in from a lot of different stakeholders. In OpenStack, it is
> ultimately a Technical Committee decision. But that decision is informed
> by the position of a number of stakeholders. This document gives
> historical context and describes the current release management team
> position.
>
> The current trade-off
> ---------------------
>
> The main pressure to release more often is to make features available to
> users faster. Developers get a faster feedback loop, hardware vendors
> ensure software is compatible with their latest products, and users get
> exciting new features. "Release early, release often" is a best practice
> in our industry -- we should generally aim at releasing as often as
> possible.
>
> But that is counterbalanced by pressure to release less often. From a
> development perspective, each release cycle comes with some process
> overhead. On the integrators side, a new release means packaging and
> validation work. On the users side, it means pressure to upgrade. To
> justify that cost, there needs to be enough user-visible benefit (like
> new features) in a given release.
>
> For the last 10 years for OpenStack, that balance has been around six
> months. Six months let us accumulate enough new development that it was
> worth upgrading to / integrating the new version, while giving enough
> time to actually do the work. It also aligned well with Foundation
> events cadence, allowing to synchronize in-person developer meetings
> date with start of cycles.
>
> For sure I'm not talking on behalf of every project (or I might be, but I
just don't know the dynamics across well enough). Anyways I think this
assessment is missing one critical point, which is a release being the
break off point for bikeshedding. I see a lot of genuine urgency to finally
make a decision that has been going back and forth from the early cycle and
the difference of opinion being finally solved when we're hitting the
feature freeze / RC time. This does not only apply to feature work but lots
of bugs as well that do not have active community members pushing them to
get fixed (and backported) early.

That push before RC is tagged is and seems to have been steadily intense
few weeks before release, it's taxing for sure but it also ensures that we
do get things done or make a real active decision to push it down the line
for at least another half a year. I think the biggest actual drawback of
longer release cycle is to lose this checkpoint (lets be honest here,
no-one cares about milestones). Not that longer release would only make
those hard decisions of "Are we including the work in this release or not"
be even rarer occasion but the push before RC would intensify a lot when we
have double the time of accumulating the review dept before we actually
have to have that discussion. I think more of valuable contributions would
be lost by losing traction (and people just deciding to carry the patches
as one more downstream only thing as the community can't get around it).

What changed
> ------------
>
> The major recent change affecting this trade-off is that the pace of new
> development in OpenStack slowed down. The rhythm of changes was divided
> by 3 between 2015 and 2021, reflecting that OpenStack is now a mature
> and stable solution, where accessing the latest features is no longer a
> major driver. That reduces some of the pressure for releasing more
> often. At the same time, we have more users every day, with larger and
> larger deployments, and keeping those clusters constantly up to date is
> an operational challenge. That increases the pressure to release less
> often. In essence, OpenStack is becoming much more like a LTS
> distribution than a web browser -- something users like moving slow.
>
> Over the past years, project teams also increasingly decoupled
> individual components from the "coordinated release". More and more
> components opted for an independent or intermediary-released model,
> where they can put out releases in the middle of a cycle, making new
> features available to their users. This increasingly opens up the
> possibility of a longer "coordinated release" which would still allow
> development teams to follow "release early, release often" best
> practices. All that recent evolution means it is (again) time to
> reconsider if the 6-month cadence is what serves our community best, and
> in particular if a longer release cadence would not suit us better.
>
> The release management team position on the debate
> --------------------------------------------------
>
> While releasing less often would definitely reduce the load on the
> release management team, most of the team work being automated, we do
> not think it should be a major factor in motivating the decision. We
> should not adjust the cadence too often though, as there is a one-time
> cost in switching our processes. In terms of impact, we expect that a
> switch to a longer cycle will encourage more project teams to adopt a
> "with-intermediary" release model (rather than the traditional "with-rc"
> single release per cycle), which may lead to abandoning the latter,
> hence simplifying our processes. Longer cycles might also discourage
> people to commit to PTL or release liaison work. We'd probably need to
> manage expectations there, and encourage more frequent switches (or
> create alternate models).
>

While most of the PTL positions have not been resolved in elections for a
while, this is a great note to keep in mind. Fundamentally we should keep
that process as the main means to select new PTLs and pushing more and more
towards handovers without even room for debate might bite us one day.

>
> If the decision is made to switch to a longer cycle, the release
> management team recommends to switch to one year directly. That would
> avoid changing it again anytime soon, and synchronizing on a calendar
> year is much simpler to follow and communicate. We also recommend
> announcing the change well in advance. We currently have an opportunity
> of making the switch when we reach the end of the release naming
> alphabet, which would also greatly simplify the communications around
> the change.
>
> Finally, it is worth mentioning the impact on the stable branch work.
> Releasing less often would likely impact the number of stable branches
> that we keep on maintaining, so that we do not go too much in the past
> (and hit unmaintained distributions or long-gone dependencies). We
> currently maintain releases for 18 months before they switch to extended
> maintenance, which results in between 3 and 4 releases being maintained
> at the same time. We'd recommend switching to maintaining one-year
> releases for 24 months, which would result in between 2 and 3 releases
> being maintained at the same time. Such a change would lead to longer
> maintenance for our users while reducing backporting work for our
> developers.
>
> --
> Thierry Carrez (ttx)
> On behalf of the OpenStack Release Management team
>
> In general my 2 cents for the proposals and couple of ideas maybe to
consider easing the pain of current model:

I really think the slowing phase (not due to lack of work items) is real
and the biggest concern I have for a longer cycle is that it would be
driving us to lose even more momentum and valuable contributions (see my
comment at the end of "current trade-off").
It would also increase a lot of the pressure to backport features (no
matter how much we claim this not being the case, we've seen it very
clearly first hand downstream when we moved our product cycle to cover
multiple upstream development cycles).
Like Dan mentioned early on this thread the downstream releases from
different sources won't align anyways and I think would contribute even
more towards the drive of doing work downstream rather than upstream.

In general I hate the idea of LTS even more than a longer cycle as then you
effectively need to maintain 2 forks and put a lot of pressure on future
contributors to align them just so that you can have yet another forking
point to start diverting again.

Perhaps a couple of things we could consider for easing the pain of
upgrades over multiple cycles, the load of release itself and still balance
the momentum of the "break point":
1) Compress the last few weeks of release with maybe aligning feature
freeze with RC and bringing the final library releases closer as well. This
might require us to have RC of clients to ease the integration pressure.
This could result to few more RCs being tagged as things are being
discovered later but them tags are fairly cheap.
2) Have any breaking changes (removing of deprecated anything) and
disruptive db migrations happening only on cycle released in the second
half of the year, while the first half would focus on bug fixes,
non-intrusive feature work, etc.
3) move to max single community goal per year that targets to be finished
on that second release (as they seem to be more and more disruptive in
nature).

As a bonus point I'd like to see a call to action to cut the test loads
we're generating. I think our gating across the projects has some serious
redundancy on them (I still remember the panic when our average check and
gate runs reached 1hr mark). It's been great to see the efforts of covering
more with our testing and that is very important too, but I still think
we're eating a lot of infra resources that could be freed up (specially
easing the last weeks of any release point) without losing the quality of
our testing.

This would give us the opportunity to have a real coordinated release as a
checkpoint to get things done, but allow distributions and consumers to
worry about the major upgrade pain of only one release per year. It would
give us still 2 times a year to hype about the new release and all the
positives coming with it and keep the planning of work, resources &
commitments in more manageable chunks. Most importantly, apart from not
allowing the breaking changes in the first release of the year we should
treat both of them as 1st class citizens of releases, not "Development and
LTS" or anything like that, just concentration of pain to the later one.

- Erno "jokke" Kuvaja
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20211208/f691834d/attachment-0001.htm>


More information about the openstack-discuss mailing list