Hi everyone,
The (long) document below reflects the current position of the release management team on a popular question: should the OpenStack release cadence be changed? Please note that we only address the release management / stable branch management facet of the problem. There are other dimensions to take into account (governance, feature deprecation, supported distros...) to get a complete view of the debate.
Introduction ------------
The subject of how often OpenStack should be released has been regularly debated in the OpenStack community. OpenStack started with a 3-month release cycle, then switched to 6-month release cycle starting with Diablo. It is often thought of a release management decision, but it is actually a much larger topic: a release cadence is a trade-off between pressure to release more often and pressure to release less often, coming in from a lot of different stakeholders. In OpenStack, it is ultimately a Technical Committee decision. But that decision is informed by the position of a number of stakeholders. This document gives historical context and describes the current release management team position.
The current trade-off ---------------------
The main pressure to release more often is to make features available to users faster. Developers get a faster feedback loop, hardware vendors ensure software is compatible with their latest products, and users get exciting new features. "Release early, release often" is a best practice in our industry -- we should generally aim at releasing as often as possible.
But that is counterbalanced by pressure to release less often. From a development perspective, each release cycle comes with some process overhead. On the integrators side, a new release means packaging and validation work. On the users side, it means pressure to upgrade. To justify that cost, there needs to be enough user-visible benefit (like new features) in a given release.
For the last 10 years for OpenStack, that balance has been around six months. Six months let us accumulate enough new development that it was worth upgrading to / integrating the new version, while giving enough time to actually do the work. It also aligned well with Foundation events cadence, allowing to synchronize in-person developer meetings date with start of cycles.
For sure I'm not talking on behalf of every project (or I might be, but I just don't know the dynamics across well enough). Anyways I think this assessment is missing one critical point, which is a release being the break off point for bikeshedding. I see a lot of genuine urgency to finally make a decision that has been going back and forth from the early cycle and
On Fri, Nov 5, 2021 at 2:39 PM Thierry Carrez <thierry@openstack.org> wrote: the difference of opinion being finally solved when we're hitting the feature freeze / RC time. This does not only apply to feature work but lots of bugs as well that do not have active community members pushing them to get fixed (and backported) early. That push before RC is tagged is and seems to have been steadily intense few weeks before release, it's taxing for sure but it also ensures that we do get things done or make a real active decision to push it down the line for at least another half a year. I think the biggest actual drawback of longer release cycle is to lose this checkpoint (lets be honest here, no-one cares about milestones). Not that longer release would only make those hard decisions of "Are we including the work in this release or not" be even rarer occasion but the push before RC would intensify a lot when we have double the time of accumulating the review dept before we actually have to have that discussion. I think more of valuable contributions would be lost by losing traction (and people just deciding to carry the patches as one more downstream only thing as the community can't get around it). What changed
------------
The major recent change affecting this trade-off is that the pace of new development in OpenStack slowed down. The rhythm of changes was divided by 3 between 2015 and 2021, reflecting that OpenStack is now a mature and stable solution, where accessing the latest features is no longer a major driver. That reduces some of the pressure for releasing more often. At the same time, we have more users every day, with larger and larger deployments, and keeping those clusters constantly up to date is an operational challenge. That increases the pressure to release less often. In essence, OpenStack is becoming much more like a LTS distribution than a web browser -- something users like moving slow.
Over the past years, project teams also increasingly decoupled individual components from the "coordinated release". More and more components opted for an independent or intermediary-released model, where they can put out releases in the middle of a cycle, making new features available to their users. This increasingly opens up the possibility of a longer "coordinated release" which would still allow development teams to follow "release early, release often" best practices. All that recent evolution means it is (again) time to reconsider if the 6-month cadence is what serves our community best, and in particular if a longer release cadence would not suit us better.
The release management team position on the debate --------------------------------------------------
While releasing less often would definitely reduce the load on the release management team, most of the team work being automated, we do not think it should be a major factor in motivating the decision. We should not adjust the cadence too often though, as there is a one-time cost in switching our processes. In terms of impact, we expect that a switch to a longer cycle will encourage more project teams to adopt a "with-intermediary" release model (rather than the traditional "with-rc" single release per cycle), which may lead to abandoning the latter, hence simplifying our processes. Longer cycles might also discourage people to commit to PTL or release liaison work. We'd probably need to manage expectations there, and encourage more frequent switches (or create alternate models).
While most of the PTL positions have not been resolved in elections for a while, this is a great note to keep in mind. Fundamentally we should keep that process as the main means to select new PTLs and pushing more and more towards handovers without even room for debate might bite us one day.
If the decision is made to switch to a longer cycle, the release management team recommends to switch to one year directly. That would avoid changing it again anytime soon, and synchronizing on a calendar year is much simpler to follow and communicate. We also recommend announcing the change well in advance. We currently have an opportunity of making the switch when we reach the end of the release naming alphabet, which would also greatly simplify the communications around the change.
Finally, it is worth mentioning the impact on the stable branch work. Releasing less often would likely impact the number of stable branches that we keep on maintaining, so that we do not go too much in the past (and hit unmaintained distributions or long-gone dependencies). We currently maintain releases for 18 months before they switch to extended maintenance, which results in between 3 and 4 releases being maintained at the same time. We'd recommend switching to maintaining one-year releases for 24 months, which would result in between 2 and 3 releases being maintained at the same time. Such a change would lead to longer maintenance for our users while reducing backporting work for our developers.
-- Thierry Carrez (ttx) On behalf of the OpenStack Release Management team
In general my 2 cents for the proposals and couple of ideas maybe to
consider easing the pain of current model: I really think the slowing phase (not due to lack of work items) is real and the biggest concern I have for a longer cycle is that it would be driving us to lose even more momentum and valuable contributions (see my comment at the end of "current trade-off"). It would also increase a lot of the pressure to backport features (no matter how much we claim this not being the case, we've seen it very clearly first hand downstream when we moved our product cycle to cover multiple upstream development cycles). Like Dan mentioned early on this thread the downstream releases from different sources won't align anyways and I think would contribute even more towards the drive of doing work downstream rather than upstream. In general I hate the idea of LTS even more than a longer cycle as then you effectively need to maintain 2 forks and put a lot of pressure on future contributors to align them just so that you can have yet another forking point to start diverting again. Perhaps a couple of things we could consider for easing the pain of upgrades over multiple cycles, the load of release itself and still balance the momentum of the "break point": 1) Compress the last few weeks of release with maybe aligning feature freeze with RC and bringing the final library releases closer as well. This might require us to have RC of clients to ease the integration pressure. This could result to few more RCs being tagged as things are being discovered later but them tags are fairly cheap. 2) Have any breaking changes (removing of deprecated anything) and disruptive db migrations happening only on cycle released in the second half of the year, while the first half would focus on bug fixes, non-intrusive feature work, etc. 3) move to max single community goal per year that targets to be finished on that second release (as they seem to be more and more disruptive in nature). As a bonus point I'd like to see a call to action to cut the test loads we're generating. I think our gating across the projects has some serious redundancy on them (I still remember the panic when our average check and gate runs reached 1hr mark). It's been great to see the efforts of covering more with our testing and that is very important too, but I still think we're eating a lot of infra resources that could be freed up (specially easing the last weeks of any release point) without losing the quality of our testing. This would give us the opportunity to have a real coordinated release as a checkpoint to get things done, but allow distributions and consumers to worry about the major upgrade pain of only one release per year. It would give us still 2 times a year to hype about the new release and all the positives coming with it and keep the planning of work, resources & commitments in more manageable chunks. Most importantly, apart from not allowing the breaking changes in the first release of the year we should treat both of them as 1st class citizens of releases, not "Development and LTS" or anything like that, just concentration of pain to the later one. - Erno "jokke" Kuvaja