[all][tc] Relmgt team position on release cadence
Hi everyone, The (long) document below reflects the current position of the release management team on a popular question: should the OpenStack release cadence be changed? Please note that we only address the release management / stable branch management facet of the problem. There are other dimensions to take into account (governance, feature deprecation, supported distros...) to get a complete view of the debate. Introduction ------------ The subject of how often OpenStack should be released has been regularly debated in the OpenStack community. OpenStack started with a 3-month release cycle, then switched to 6-month release cycle starting with Diablo. It is often thought of a release management decision, but it is actually a much larger topic: a release cadence is a trade-off between pressure to release more often and pressure to release less often, coming in from a lot of different stakeholders. In OpenStack, it is ultimately a Technical Committee decision. But that decision is informed by the position of a number of stakeholders. This document gives historical context and describes the current release management team position. The current trade-off --------------------- The main pressure to release more often is to make features available to users faster. Developers get a faster feedback loop, hardware vendors ensure software is compatible with their latest products, and users get exciting new features. "Release early, release often" is a best practice in our industry -- we should generally aim at releasing as often as possible. But that is counterbalanced by pressure to release less often. From a development perspective, each release cycle comes with some process overhead. On the integrators side, a new release means packaging and validation work. On the users side, it means pressure to upgrade. To justify that cost, there needs to be enough user-visible benefit (like new features) in a given release. For the last 10 years for OpenStack, that balance has been around six months. Six months let us accumulate enough new development that it was worth upgrading to / integrating the new version, while giving enough time to actually do the work. It also aligned well with Foundation events cadence, allowing to synchronize in-person developer meetings date with start of cycles. What changed ------------ The major recent change affecting this trade-off is that the pace of new development in OpenStack slowed down. The rhythm of changes was divided by 3 between 2015 and 2021, reflecting that OpenStack is now a mature and stable solution, where accessing the latest features is no longer a major driver. That reduces some of the pressure for releasing more often. At the same time, we have more users every day, with larger and larger deployments, and keeping those clusters constantly up to date is an operational challenge. That increases the pressure to release less often. In essence, OpenStack is becoming much more like a LTS distribution than a web browser -- something users like moving slow. Over the past years, project teams also increasingly decoupled individual components from the "coordinated release". More and more components opted for an independent or intermediary-released model, where they can put out releases in the middle of a cycle, making new features available to their users. This increasingly opens up the possibility of a longer "coordinated release" which would still allow development teams to follow "release early, release often" best practices. All that recent evolution means it is (again) time to reconsider if the 6-month cadence is what serves our community best, and in particular if a longer release cadence would not suit us better. The release management team position on the debate -------------------------------------------------- While releasing less often would definitely reduce the load on the release management team, most of the team work being automated, we do not think it should be a major factor in motivating the decision. We should not adjust the cadence too often though, as there is a one-time cost in switching our processes. In terms of impact, we expect that a switch to a longer cycle will encourage more project teams to adopt a "with-intermediary" release model (rather than the traditional "with-rc" single release per cycle), which may lead to abandoning the latter, hence simplifying our processes. Longer cycles might also discourage people to commit to PTL or release liaison work. We'd probably need to manage expectations there, and encourage more frequent switches (or create alternate models). If the decision is made to switch to a longer cycle, the release management team recommends to switch to one year directly. That would avoid changing it again anytime soon, and synchronizing on a calendar year is much simpler to follow and communicate. We also recommend announcing the change well in advance. We currently have an opportunity of making the switch when we reach the end of the release naming alphabet, which would also greatly simplify the communications around the change. Finally, it is worth mentioning the impact on the stable branch work. Releasing less often would likely impact the number of stable branches that we keep on maintaining, so that we do not go too much in the past (and hit unmaintained distributions or long-gone dependencies). We currently maintain releases for 18 months before they switch to extended maintenance, which results in between 3 and 4 releases being maintained at the same time. We'd recommend switching to maintaining one-year releases for 24 months, which would result in between 2 and 3 releases being maintained at the same time. Such a change would lead to longer maintenance for our users while reducing backporting work for our developers. -- Thierry Carrez (ttx) On behalf of the OpenStack Release Management team
On Fri, Nov 5, 2021 at 10:39 AM Thierry Carrez <thierry@openstack.org> wrote:
Hi everyone,
The (long) document below reflects the current position of the release management team on a popular question: should the OpenStack release cadence be changed? Please note that we only address the release management / stable branch management facet of the problem. There are other dimensions to take into account (governance, feature deprecation, supported distros...) to get a complete view of the debate.
Introduction ------------
The subject of how often OpenStack should be released has been regularly debated in the OpenStack community. OpenStack started with a 3-month release cycle, then switched to 6-month release cycle starting with Diablo. It is often thought of a release management decision, but it is actually a much larger topic: a release cadence is a trade-off between pressure to release more often and pressure to release less often, coming in from a lot of different stakeholders. In OpenStack, it is ultimately a Technical Committee decision. But that decision is informed by the position of a number of stakeholders. This document gives historical context and describes the current release management team position.
The current trade-off ---------------------
The main pressure to release more often is to make features available to users faster. Developers get a faster feedback loop, hardware vendors ensure software is compatible with their latest products, and users get exciting new features. "Release early, release often" is a best practice in our industry -- we should generally aim at releasing as often as possible.
But that is counterbalanced by pressure to release less often. From a development perspective, each release cycle comes with some process overhead. On the integrators side, a new release means packaging and validation work. On the users side, it means pressure to upgrade. To justify that cost, there needs to be enough user-visible benefit (like new features) in a given release.
For the last 10 years for OpenStack, that balance has been around six months. Six months let us accumulate enough new development that it was worth upgrading to / integrating the new version, while giving enough time to actually do the work. It also aligned well with Foundation events cadence, allowing to synchronize in-person developer meetings date with start of cycles.
What changed ------------
The major recent change affecting this trade-off is that the pace of new development in OpenStack slowed down. The rhythm of changes was divided by 3 between 2015 and 2021, reflecting that OpenStack is now a mature and stable solution, where accessing the latest features is no longer a major driver. That reduces some of the pressure for releasing more often. At the same time, we have more users every day, with larger and larger deployments, and keeping those clusters constantly up to date is an operational challenge. That increases the pressure to release less often. In essence, OpenStack is becoming much more like a LTS distribution than a web browser -- something users like moving slow.
Over the past years, project teams also increasingly decoupled individual components from the "coordinated release". More and more components opted for an independent or intermediary-released model, where they can put out releases in the middle of a cycle, making new features available to their users. This increasingly opens up the possibility of a longer "coordinated release" which would still allow development teams to follow "release early, release often" best practices. All that recent evolution means it is (again) time to reconsider if the 6-month cadence is what serves our community best, and in particular if a longer release cadence would not suit us better.
The release management team position on the debate --------------------------------------------------
While releasing less often would definitely reduce the load on the release management team, most of the team work being automated, we do not think it should be a major factor in motivating the decision. We should not adjust the cadence too often though, as there is a one-time cost in switching our processes. In terms of impact, we expect that a switch to a longer cycle will encourage more project teams to adopt a "with-intermediary" release model (rather than the traditional "with-rc" single release per cycle), which may lead to abandoning the latter, hence simplifying our processes. Longer cycles might also discourage people to commit to PTL or release liaison work. We'd probably need to manage expectations there, and encourage more frequent switches (or create alternate models).
If the decision is made to switch to a longer cycle, the release management team recommends to switch to one year directly. That would avoid changing it again anytime soon, and synchronizing on a calendar year is much simpler to follow and communicate. We also recommend announcing the change well in advance. We currently have an opportunity of making the switch when we reach the end of the release naming alphabet, which would also greatly simplify the communications around the change.
Finally, it is worth mentioning the impact on the stable branch work. Releasing less often would likely impact the number of stable branches that we keep on maintaining, so that we do not go too much in the past (and hit unmaintained distributions or long-gone dependencies). We currently maintain releases for 18 months before they switch to extended maintenance, which results in between 3 and 4 releases being maintained at the same time. We'd recommend switching to maintaining one-year releases for 24 months, which would result in between 2 and 3 releases being maintained at the same time. Such a change would lead to longer maintenance for our users while reducing backporting work for our developers.
Thanks for the write up Thierry. I wonder what are the thoughts of the community of having LTS + normal releases so that we can have the power of both? I guess that is essentially what we have with EM, but I guess we could introduce a way to ensure that operators can just upgrade LTS to LTS. It can complicate things a bit from a CI and project management side, but I think it could solve the problem for both sides that need want new features + those who want stability?
-- Thierry Carrez (ttx) On behalf of the OpenStack Release Management team
-- Mohammed Naser VEXXHOST, Inc.
On 2021-11-05 11:53:25 -0400 (-0400), Mohammed Naser wrote: [...]
I wonder what are the thoughts of the community of having LTS + normal releases so that we can have the power of both? I guess that is essentially what we have with EM, but I guess we could introduce a way to ensure that operators can just upgrade LTS to LTS.
It can complicate things a bit from a CI and project management side, but I think it could solve the problem for both sides that need want new features + those who want stability?
This is really just another way of suggesting we solve the skip-level upgrades problem, since we can't really test fast-forward upgrades through so-called "non-LTS" versions once we abandon them. -- Jeremy Stanley
On Fri, 2021-11-05 at 16:18 +0000, Jeremy Stanley wrote:
On 2021-11-05 11:53:25 -0400 (-0400), Mohammed Naser wrote: [...]
I wonder what are the thoughts of the community of having LTS + normal releases so that we can have the power of both? I guess that is essentially what we have with EM, but I guess we could introduce a way to ensure that operators can just upgrade LTS to LTS.
It can complicate things a bit from a CI and project management side, but I think it could solve the problem for both sides that need want new features + those who want stability?
This is really just another way of suggesting we solve the skip-level upgrades problem, since we can't really test fast-forward upgrades through so-called "non-LTS" versions once we abandon them. well realisticlly i dont think the customer that are pusshign use to supprot skip level upgrades or fast forward upgrades will be able to work with a cadence of 1 release a year so i would expect use to still need to consider skip level upgrade between lts-2 to new lts
we have several customer that need at least 12 months to complte certifacaiton of all of there workloads on a new cloud so openstack distos will still have to support those customer that really need a 2 yearly or longer upgrade cadance even if we had and lts release every year. there are many other uses of openstack that can effectivly live at head cern and vexhost been two example that the 1 year cycle might suit well but for our telco and finacial custoemr 12 is still a short upgrade horizon for them.
On Fri, 2021-11-05 at 11:53 -0400, Mohammed Naser wrote:
On Fri, Nov 5, 2021 at 10:39 AM Thierry Carrez <thierry@openstack.org> wrote:
Hi everyone,
The (long) document below reflects the current position of the release management team on a popular question: should the OpenStack release cadence be changed? Please note that we only address the release management / stable branch management facet of the problem. There are other dimensions to take into account (governance, feature deprecation, supported distros...) to get a complete view of the debate.
Introduction ------------
The subject of how often OpenStack should be released has been regularly debated in the OpenStack community. OpenStack started with a 3-month release cycle, then switched to 6-month release cycle starting with Diablo. It is often thought of a release management decision, but it is actually a much larger topic: a release cadence is a trade-off between pressure to release more often and pressure to release less often, coming in from a lot of different stakeholders. In OpenStack, it is ultimately a Technical Committee decision. But that decision is informed by the position of a number of stakeholders. This document gives historical context and describes the current release management team position.
The current trade-off ---------------------
The main pressure to release more often is to make features available to users faster. Developers get a faster feedback loop, hardware vendors ensure software is compatible with their latest products, and users get exciting new features. "Release early, release often" is a best practice in our industry -- we should generally aim at releasing as often as possible.
But that is counterbalanced by pressure to release less often. From a development perspective, each release cycle comes with some process overhead. On the integrators side, a new release means packaging and validation work. On the users side, it means pressure to upgrade. To justify that cost, there needs to be enough user-visible benefit (like new features) in a given release.
For the last 10 years for OpenStack, that balance has been around six months. Six months let us accumulate enough new development that it was worth upgrading to / integrating the new version, while giving enough time to actually do the work. It also aligned well with Foundation events cadence, allowing to synchronize in-person developer meetings date with start of cycles.
What changed ------------
The major recent change affecting this trade-off is that the pace of new development in OpenStack slowed down. The rhythm of changes was divided by 3 between 2015 and 2021, reflecting that OpenStack is now a mature and stable solution, where accessing the latest features is no longer a major driver. That reduces some of the pressure for releasing more often. At the same time, we have more users every day, with larger and larger deployments, and keeping those clusters constantly up to date is an operational challenge. That increases the pressure to release less often. In essence, OpenStack is becoming much more like a LTS distribution than a web browser -- something users like moving slow.
Over the past years, project teams also increasingly decoupled individual components from the "coordinated release". More and more components opted for an independent or intermediary-released model, where they can put out releases in the middle of a cycle, making new features available to their users. This increasingly opens up the possibility of a longer "coordinated release" which would still allow development teams to follow "release early, release often" best practices. All that recent evolution means it is (again) time to reconsider if the 6-month cadence is what serves our community best, and in particular if a longer release cadence would not suit us better.
The release management team position on the debate --------------------------------------------------
While releasing less often would definitely reduce the load on the release management team, most of the team work being automated, we do not think it should be a major factor in motivating the decision. We should not adjust the cadence too often though, as there is a one-time cost in switching our processes. In terms of impact, we expect that a switch to a longer cycle will encourage more project teams to adopt a "with-intermediary" release model (rather than the traditional "with-rc" single release per cycle), which may lead to abandoning the latter, hence simplifying our processes. Longer cycles might also discourage people to commit to PTL or release liaison work. We'd probably need to manage expectations there, and encourage more frequent switches (or create alternate models).
If the decision is made to switch to a longer cycle, the release management team recommends to switch to one year directly. That would avoid changing it again anytime soon, and synchronizing on a calendar year is much simpler to follow and communicate. We also recommend announcing the change well in advance. We currently have an opportunity of making the switch when we reach the end of the release naming alphabet, which would also greatly simplify the communications around the change.
Finally, it is worth mentioning the impact on the stable branch work. Releasing less often would likely impact the number of stable branches that we keep on maintaining, so that we do not go too much in the past (and hit unmaintained distributions or long-gone dependencies). We currently maintain releases for 18 months before they switch to extended maintenance, which results in between 3 and 4 releases being maintained at the same time. We'd recommend switching to maintaining one-year releases for 24 months, which would result in between 2 and 3 releases being maintained at the same time. Such a change would lead to longer maintenance for our users while reducing backporting work for our developers.
Thanks for the write up Thierry.
I wonder what are the thoughts of the community of having LTS + normal releases so that we can have the power of both? I guess that is essentially what we have with EM, but I guess we could introduce a way to ensure that operators can just upgrade LTS to LTS. if we were to intoduce LTS release we would have to agree on what they were as a compunity and we would need to support roling upgrade between LTS versions
that would reqruie all distibuted project like nova to ensur that lts to lts rpc and db compatitblity is maintained instead of the current N+1 guarunetees we have to day. i know that would make some downstream happy as perhaps we could align our FFU support with THE LTS cadance but i would hold my breath on that. as a developer i woudl presonally prefer to have shorter cycle upstream with uprades supporte aross a more then n+1 e.g. release every 2 months but keep rolling upgrade compatiablty for at least 12 months or someting like that. the release with intermeiday lifecyle can enable that while still allowign use to have a longer or shorter planing horizon depending on the project and its veliocity.
It can complicate things a bit from a CI and project management side, but I think it could solve the problem for both sides that need want new features + those who want stability?
it might but i suspect that it will still not align with distros canonical have a new lts every 2 years and redhat has a new release every 18~months or so based on every 3rd release the lts idea i think has merrit but we likely would have to maintain at least 2 lts release in paralel to make it work. so something like 1 lts release a year maintained for 2 years with normal release every 6 months that are only maintianed for 6 months each project woudl keep rolling upgrade compatitblity ideally between lts release rather then n+1 as a new mimium. the implication of this is that we would want to have grenade jobs testing latest lts to master upgrade compatiblity in additon to n to n+1 where those differ.
-- Thierry Carrez (ttx) On behalf of the OpenStack Release Management team
i know that would make some downstream happy as perhaps we could align our FFU support with THE LTS cadance but i would hold my breath on that.
Except any downstream that is unable to align on the LTS schedule either permanently or temporarily would have to wait a full extra year to resync, which would make them decidedly unhappy I think. I'm sure some distros have had to realign downstream releases to "the next" upstream one more than once, so... :)
as a developer i woudl presonally prefer to have shorter cycle upstream with uprades supporte aross a more then n+1 e.g. release every 2 months but keep rolling upgrade compatiablty for at least 12 months or someting like that. the release with intermeiday lifecyle can enable that while still allowign use to have a longer or shorter planing horizon depending on the project and its veliocity.
This has the same problem as you highlighted above, which is that we all have to agree on the same 12 months that we're supporting that span, otherwise this collapses to just the intersection of any two projects' windows. --Dan
On 2021-11-05 17:47:13 +0000 (+0000), Sean Mooney wrote: [...]
if we were to intoduce LTS release we would have to agree on what they were as a compunity and we would need to support roling upgrade between LTS versions [...]
Yes, but what about upgrades between LTS and non-LTS versions (from or to)? Do we test all those as well? And if we don't, are users likely to want to use the non-LTS versions at all knowing they might be unable to cleanly update from them to an LTS version later on?
so something like 1 lts release a year maintained for 2 years with normal release every 6 months that are only maintianed for 6 months [...]
To restate what I said in my other reply, this assumes a future where skip-level upgrades are possible. Otherwise what happens with a series of releases like A,b,C,d,E where A/C/E are the LTS releases and b/d are the non-LTS releases and someone who's using A wants to upgrade to C but we've already stopped maintaining b and can't guarantee it's even installable any longer? If the LTS idea is interesting to people, then we should take a step back and work on switching from FFU to SLU first. If we can't solve that, then there's no point to having non-LTS releases. -- Jeremy Stanley
Hi, On piątek, 5 listopada 2021 18:47:13 CET Sean Mooney wrote:
On Fri, 2021-11-05 at 11:53 -0400, Mohammed Naser wrote:
On Fri, Nov 5, 2021 at 10:39 AM Thierry Carrez <thierry@openstack.org> wrote:
Hi everyone,
The (long) document below reflects the current position of the release management team on a popular question: should the OpenStack release cadence be changed? Please note that we only address the release management / stable branch management facet of the problem. There are other dimensions to take into account (governance, feature deprecation, supported distros...) to get a complete view of the debate.
Introduction ------------
The subject of how often OpenStack should be released has been regularly debated in the OpenStack community. OpenStack started with a 3-month release cycle, then switched to 6-month release cycle starting with Diablo. It is often thought of a release management decision, but it is actually a much larger topic: a release cadence is a trade-off between pressure to release more often and pressure to release less often, coming in from a lot of different stakeholders. In OpenStack, it is ultimately a Technical Committee decision. But that decision is informed by the position of a number of stakeholders. This document gives historical context and describes the current release management team position.
The current trade-off ---------------------
The main pressure to release more often is to make features available to users faster. Developers get a faster feedback loop, hardware vendors ensure software is compatible with their latest products, and users get exciting new features. "Release early, release often" is a best practice in our industry -- we should generally aim at releasing as often as possible.
But that is counterbalanced by pressure to release less often. From a development perspective, each release cycle comes with some process overhead. On the integrators side, a new release means packaging and validation work. On the users side, it means pressure to upgrade. To justify that cost, there needs to be enough user-visible benefit (like new features) in a given release.
For the last 10 years for OpenStack, that balance has been around six months. Six months let us accumulate enough new development that it was worth upgrading to / integrating the new version, while giving enough time to actually do the work. It also aligned well with Foundation events cadence, allowing to synchronize in-person developer meetings date with start of cycles.
What changed ------------
The major recent change affecting this trade-off is that the pace of new development in OpenStack slowed down. The rhythm of changes was divided by 3 between 2015 and 2021, reflecting that OpenStack is now a mature and stable solution, where accessing the latest features is no longer a major driver. That reduces some of the pressure for releasing more often. At the same time, we have more users every day, with larger and larger deployments, and keeping those clusters constantly up to date is an operational challenge. That increases the pressure to release less often. In essence, OpenStack is becoming much more like a LTS distribution than a web browser -- something users like moving slow.
Over the past years, project teams also increasingly decoupled individual components from the "coordinated release". More and more components opted for an independent or intermediary-released model, where they can put out releases in the middle of a cycle, making new features available to their users. This increasingly opens up the possibility of a longer "coordinated release" which would still allow development teams to follow "release early, release often" best practices. All that recent evolution means it is (again) time to reconsider if the 6-month cadence is what serves our community best, and in particular if a longer release cadence would not suit us better.
The release management team position on the debate --------------------------------------------------
While releasing less often would definitely reduce the load on the release management team, most of the team work being automated, we do not think it should be a major factor in motivating the decision. We should not adjust the cadence too often though, as there is a one-time cost in switching our processes. In terms of impact, we expect that a switch to a longer cycle will encourage more project teams to adopt a "with-intermediary" release model (rather than the traditional "with-rc" single release per cycle), which may lead to abandoning the latter, hence simplifying our processes. Longer cycles might also discourage people to commit to PTL or release liaison work. We'd probably need to manage expectations there, and encourage more frequent switches (or create alternate models).
If the decision is made to switch to a longer cycle, the release management team recommends to switch to one year directly. That would avoid changing it again anytime soon, and synchronizing on a calendar year is much simpler to follow and communicate. We also recommend announcing the change well in advance. We currently have an opportunity of making the switch when we reach the end of the release naming alphabet, which would also greatly simplify the communications around the change.
Finally, it is worth mentioning the impact on the stable branch work. Releasing less often would likely impact the number of stable branches that we keep on maintaining, so that we do not go too much in the past (and hit unmaintained distributions or long-gone dependencies). We currently maintain releases for 18 months before they switch to extended maintenance, which results in between 3 and 4 releases being maintained at the same time. We'd recommend switching to maintaining one-year releases for 24 months, which would result in between 2 and 3 releases being maintained at the same time. Such a change would lead to longer maintenance for our users while reducing backporting work for our developers.
Thanks for the write up Thierry.
I wonder what are the thoughts of the community of having LTS + normal releases so that we can have the power of both? I guess that is essentially what we have with EM, but I guess we could introduce a way to ensure that operators can just upgrade LTS to LTS.
if we were to intoduce LTS release we would have to agree on what they were as a compunity and we would need to support roling upgrade between LTS versions
that would reqruie all distibuted project like nova to ensur that lts to lts rpc and db compatitblity is maintained instead of the current N+1 guarunetees we have to day.
Not only that but also cross project communication, like e.g. nova <-> neutron needs to work fine between such LTS releases.
i know that would make some downstream happy as perhaps we could align our
FFU
support with THE LTS cadance but i would hold my breath on that.
as a developer i woudl presonally prefer to have shorter cycle upstream with uprades supporte aross a more then n+1 e.g. release every 2 months but keep rolling upgrade compatiablty for at least 12 months or someting like that. the release with intermeiday lifecyle can enable that while still allowign use to have a longer or shorter planing horizon depending on the project and its veliocity.
It can complicate things a bit from a CI and project management side, but I think it could solve the problem for both sides that need want new features + those who want stability?
it might but i suspect that it will still not align with distros canonical have a new lts every 2 years and redhat has a new release every 18~months or so based on every 3rd release
the lts idea i think has merrit but we likely would have to maintain at least 2 lts release in paralel to make it work.
so something like 1 lts release a year maintained for 2 years with normal release every 6 months that are only maintianed for 6 months each project woudl keep rolling upgrade compatitblity ideally between lts release rather then n+1 as a new mimium. the implication of this is that we would want to have grenade jobs testing latest lts to master upgrade compatiblity in additon to n to n+1 where those differ.
-- Thierry Carrez (ttx) On behalf of the OpenStack Release Management team
-- Slawek Kaplonski Principal Software Engineer Red Hat
On Fri, Nov 5, 2021 at 5:04 PM Mohammed Naser <mnaser@vexxhost.com> wrote:
On Fri, Nov 5, 2021 at 10:39 AM Thierry Carrez <thierry@openstack.org> wrote:
Hi everyone,
The (long) document below reflects the current position of the release management team on a popular question: should the OpenStack release cadence be changed? Please note that we only address the release management / stable branch management facet of the problem. There are other dimensions to take into account (governance, feature deprecation, supported distros...) to get a complete view of the debate.
Introduction ------------
The subject of how often OpenStack should be released has been regularly debated in the OpenStack community. OpenStack started with a 3-month release cycle, then switched to 6-month release cycle starting with Diablo. It is often thought of a release management decision, but it is actually a much larger topic: a release cadence is a trade-off between pressure to release more often and pressure to release less often, coming in from a lot of different stakeholders. In OpenStack, it is ultimately a Technical Committee decision. But that decision is informed by the position of a number of stakeholders. This document gives historical context and describes the current release management team position.
The current trade-off ---------------------
The main pressure to release more often is to make features available to users faster. Developers get a faster feedback loop, hardware vendors ensure software is compatible with their latest products, and users get exciting new features. "Release early, release often" is a best practice in our industry -- we should generally aim at releasing as often as possible.
But that is counterbalanced by pressure to release less often. From a development perspective, each release cycle comes with some process overhead. On the integrators side, a new release means packaging and validation work. On the users side, it means pressure to upgrade. To justify that cost, there needs to be enough user-visible benefit (like new features) in a given release.
For the last 10 years for OpenStack, that balance has been around six months. Six months let us accumulate enough new development that it was worth upgrading to / integrating the new version, while giving enough time to actually do the work. It also aligned well with Foundation events cadence, allowing to synchronize in-person developer meetings date with start of cycles.
What changed ------------
The major recent change affecting this trade-off is that the pace of new development in OpenStack slowed down. The rhythm of changes was divided by 3 between 2015 and 2021, reflecting that OpenStack is now a mature and stable solution, where accessing the latest features is no longer a major driver. That reduces some of the pressure for releasing more often. At the same time, we have more users every day, with larger and larger deployments, and keeping those clusters constantly up to date is an operational challenge. That increases the pressure to release less often. In essence, OpenStack is becoming much more like a LTS distribution than a web browser -- something users like moving slow.
Over the past years, project teams also increasingly decoupled individual components from the "coordinated release". More and more components opted for an independent or intermediary-released model, where they can put out releases in the middle of a cycle, making new features available to their users. This increasingly opens up the possibility of a longer "coordinated release" which would still allow development teams to follow "release early, release often" best practices. All that recent evolution means it is (again) time to reconsider if the 6-month cadence is what serves our community best, and in particular if a longer release cadence would not suit us better.
The release management team position on the debate --------------------------------------------------
While releasing less often would definitely reduce the load on the release management team, most of the team work being automated, we do not think it should be a major factor in motivating the decision. We should not adjust the cadence too often though, as there is a one-time cost in switching our processes. In terms of impact, we expect that a switch to a longer cycle will encourage more project teams to adopt a "with-intermediary" release model (rather than the traditional "with-rc" single release per cycle), which may lead to abandoning the latter, hence simplifying our processes. Longer cycles might also discourage people to commit to PTL or release liaison work. We'd probably need to manage expectations there, and encourage more frequent switches (or create alternate models).
If the decision is made to switch to a longer cycle, the release management team recommends to switch to one year directly. That would avoid changing it again anytime soon, and synchronizing on a calendar year is much simpler to follow and communicate. We also recommend announcing the change well in advance. We currently have an opportunity of making the switch when we reach the end of the release naming alphabet, which would also greatly simplify the communications around the change.
Finally, it is worth mentioning the impact on the stable branch work. Releasing less often would likely impact the number of stable branches that we keep on maintaining, so that we do not go too much in the past (and hit unmaintained distributions or long-gone dependencies). We currently maintain releases for 18 months before they switch to extended maintenance, which results in between 3 and 4 releases being maintained at the same time. We'd recommend switching to maintaining one-year releases for 24 months, which would result in between 2 and 3 releases being maintained at the same time. Such a change would lead to longer maintenance for our users while reducing backporting work for our developers.
Thanks for the write up Thierry.
++ very well written!
I wonder what are the thoughts of the community of having LTS + normal releases so that we can have the power of both? I guess that is essentially what we have with EM, but I guess we could introduce a way to ensure that operators can just upgrade LTS to LTS.
This is basically what Ironic does: we release with the rest of OpenStack, but we also do 2 more releases per cycle with their own bugfix/X.Y branches. Dmitry
It can complicate things a bit from a CI and project management side, but I think it could solve the problem for both sides that need want new features + those who want stability?
-- Thierry Carrez (ttx) On behalf of the OpenStack Release Management team
-- Mohammed Naser VEXXHOST, Inc.
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
On 2021-11-06 16:22:23 +0100 (+0100), Dmitry Tantsur wrote: [...]
This is basically what Ironic does: we release with the rest of OpenStack, but we also do 2 more releases per cycle with their own bugfix/X.Y branches. [...]
Do you expect users to be able to upgrade between those, and if so is that tested? -- Jeremy Stanley
Hi, On Sat, Nov 6, 2021 at 7:32 PM Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2021-11-06 16:22:23 +0100 (+0100), Dmitry Tantsur wrote: [...]
This is basically what Ironic does: we release with the rest of OpenStack, but we also do 2 more releases per cycle with their own bugfix/X.Y branches. [...]
Do you expect users to be able to upgrade between those, and if so is that tested?
We prefer to think that upgrades are supported, and we're ready to fix bugs when they arise, but we don't actively test that. Not that we don't want to, mostly because of understaffing, CI stability and the fact that grenade is painful enough as it is. Dmitry
-- Jeremy Stanley
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
---- On Fri, 05 Nov 2021 09:26:13 -0500 Thierry Carrez <thierry@openstack.org> wrote ----
Hi everyone,
The (long) document below reflects the current position of the release management team on a popular question: should the OpenStack release cadence be changed? Please note that we only address the release management / stable branch management facet of the problem. There are other dimensions to take into account (governance, feature deprecation, supported distros...) to get a complete view of the debate.
Introduction ------------
The subject of how often OpenStack should be released has been regularly debated in the OpenStack community. OpenStack started with a 3-month release cycle, then switched to 6-month release cycle starting with Diablo. It is often thought of a release management decision, but it is actually a much larger topic: a release cadence is a trade-off between pressure to release more often and pressure to release less often, coming in from a lot of different stakeholders. In OpenStack, it is ultimately a Technical Committee decision. But that decision is informed by the position of a number of stakeholders. This document gives historical context and describes the current release management team position.
The current trade-off ---------------------
The main pressure to release more often is to make features available to users faster. Developers get a faster feedback loop, hardware vendors ensure software is compatible with their latest products, and users get exciting new features. "Release early, release often" is a best practice in our industry -- we should generally aim at releasing as often as possible.
But that is counterbalanced by pressure to release less often. From a development perspective, each release cycle comes with some process overhead. On the integrators side, a new release means packaging and validation work. On the users side, it means pressure to upgrade. To justify that cost, there needs to be enough user-visible benefit (like new features) in a given release.
Thanks Thierry for the detailed write up. At the same time, a shorter release which leads to upgrade-often pressure but it will have fewer number of changes/features, so make the upgrade easy and longer-release model will have more changes/features that will make upgrade more complex.
For the last 10 years for OpenStack, that balance has been around six months. Six months let us accumulate enough new development that it was worth upgrading to / integrating the new version, while giving enough time to actually do the work. It also aligned well with Foundation events cadence, allowing to synchronize in-person developer meetings date with start of cycles.
What changed ------------
The major recent change affecting this trade-off is that the pace of new development in OpenStack slowed down. The rhythm of changes was divided by 3 between 2015 and 2021, reflecting that OpenStack is now a mature and stable solution, where accessing the latest features is no longer a major driver. That reduces some of the pressure for releasing more often. At the same time, we have more users every day, with larger and larger deployments, and keeping those clusters constantly up to date is an operational challenge. That increases the pressure to release less often. In essence, OpenStack is becoming much more like a LTS distribution than a web browser -- something users like moving slow.
Over the past years, project teams also increasingly decoupled individual components from the "coordinated release". More and more components opted for an independent or intermediary-released model, where they can put out releases in the middle of a cycle, making new features available to their users. This increasingly opens up the possibility of a longer "coordinated release" which would still allow development teams to follow "release early, release often" best practices. All that recent evolution means it is (again) time to reconsider if the 6-month cadence is what serves our community best, and in particular if a longer release cadence would not suit us better.
The release management team position on the debate --------------------------------------------------
While releasing less often would definitely reduce the load on the release management team, most of the team work being automated, we do not think it should be a major factor in motivating the decision. We should not adjust the cadence too often though, as there is a one-time cost in switching our processes. In terms of impact, we expect that a switch to a longer cycle will encourage more project teams to adopt a "with-intermediary" release model (rather than the traditional "with-rc" single release per cycle), which may lead to abandoning the latter, hence simplifying our processes. Longer cycles might also discourage people to commit to PTL or release liaison work. We'd probably need to manage expectations there, and encourage more frequent switches (or create alternate models).
If the decision is made to switch to a longer cycle, the release management team recommends to switch to one year directly. That would avoid changing it again anytime soon, and synchronizing on a calendar year is much simpler to follow and communicate. We also recommend announcing the change well in advance. We currently have an opportunity of making the switch when we reach the end of the release naming alphabet, which would also greatly simplify the communications around the change.
Finally, it is worth mentioning the impact on the stable branch work. Releasing less often would likely impact the number of stable branches that we keep on maintaining, so that we do not go too much in the past (and hit unmaintained distributions or long-gone dependencies). We currently maintain releases for 18 months before they switch to extended maintenance, which results in between 3 and 4 releases being maintained at the same time. We'd recommend switching to maintaining one-year releases for 24 months, which would result in between 2 and 3 releases being maintained at the same time. Such a change would lead to longer maintenance for our users while reducing backporting work for our developers.
Yeah, if we switch to one-year release model then definitely we need to change the stable support policy. For example, do we need an extended maintenance phase if we support a release for 24 months? and if we keep the EM phase too, then important thing to note is that EM phase is the almost same amount of work upstream developers are spending now a days in terms of testing or backports (even though we have the agreement of reducing the effort for EM stables when needed, but I do not see that is happening, and we end up doing the same amount of maintenance there as we do for supported stables). As the yearly release model extend the stable support window and with our current situation of stable team shrinking, it is an open question for us whether we as a community will be able to support the new stable release window or not? Another point we need to consider is how it will impact the contribution support from the companies and volunteer contributors (we might not have many volunteer contributors now, so we can ignore it, but let's consider companies' support). For example, the foundation membership contract does not have contribution requirements, so companies' contribution support is always a volunteer or based on their customer needs. In that case, we need to think about how we can keep that without any impact. For example, change the foundation membership requirement or get companies' feedback if it does not impact their contribution support policy. -gmann
-- Thierry Carrez (ttx) On behalf of the OpenStack Release Management team
Ghanshyam Mann wrote:
[...] Thanks Thierry for the detailed write up.
At the same time, a shorter release which leads to upgrade-often pressure but it will have fewer number of changes/features, so make the upgrade easy and longer-release model will have more changes/features that will make upgrade more complex.
I think that was true a few years ago, but I'm not convinced that still holds. We currently have a third of the changes volume we had back in 2015, so a one-year release in 2022 would contain far less changes than a 6-month release from 2015. Also, thanks to our testing and our focus on stability, the pain linked to the amount of breaking changes in a release is now negligible compared to the basic pain of going through a 1M-core deployment and upgrading the various pieces... every 6 months. I've heard of multiple users claiming it takes them close to 6 months to upgrade their massive deployments to a new version. So when they are done, they have to start again. -- Thierry Carrez (ttx)
On Mon, Nov 8, 2021 at 10:44 AM Thierry Carrez <thierry@openstack.org> wrote:
Ghanshyam Mann wrote:
[...] Thanks Thierry for the detailed write up.
At the same time, a shorter release which leads to upgrade-often pressure but it will have fewer number of changes/features, so make the upgrade easy and longer-release model will have more changes/features that will make upgrade more complex.
I think that was true a few years ago, but I'm not convinced that still holds. We currently have a third of the changes volume we had back in 2015, so a one-year release in 2022 would contain far less changes than a 6-month release from 2015.
I concur. Also, in 2015, we were still very much in a "move fast" mode of operation as a community.
Also, thanks to our testing and our focus on stability, the pain linked to the amount of breaking changes in a release is now negligible compared to the basic pain of going through a 1M-core deployment and upgrading the various pieces... every 6 months. I've heard of multiple users claiming it takes them close to 6 months to upgrade their massive deployments to a new version. So when they are done, they have to start again.
-- Thierry Carrez (ttx)
I've been hearing the exact same messaging from larger operators as well as operators in environments where they are concerned about managing risk for at least the past two years. These operators have indicated it is not uncommon for the upgrade projects which consume, test, certify for production, and deploy to production take *at least* six months to execute. At the same time, they are shy of being the ones to also "find all of the bugs", and so the project doesn't actually start until well after the new coordinated release has occurred. Quickly they become yet another version behind with this pattern. I suspect it is really easy for us as a CI focused community to think that six months is plenty of time to roll out a fully updated deployment which has been fully tested in every possible way. Except, these operators are often trying to do just that on physical hardware, with updated firmware and operatings systems bringing in new variables with every single change which may ripple up the entire stack. These operators then have to apply the lessons they have previously learned once they have worked through all of the variables. In some cases this may involve aspects such as benchmarking, to ensure they don't need to make additional changes which need to be factored into their deployment, sending them back to the start of their testing. All while thinking of phrases like "business/mission critical". I guess this means I'm in support of revising the release cycle. At the same time, I think it would be wise for us to see if we can learn from these operators the pain points they experience, the process they leverage, and ultimately see if there are opportunities to spread knowledge or potentially tooling. Or maybe even get them to contribute their patches upstream. Not that all of these issues are easily solved with any level of code, but sometimes they can include contextual disconnects and resolving those are just as important as shipping a release, IMHO. -Julia
Hey, I'd like to add my 2 cents. It's hard to upgrade a region, so when it comes to upgrade multiples regions, it's even harder. Some operators also have their own downstream patchs / extensions / drivers which make the upgrade process more complex, so it take more time (for all reasons already given in the thread, need to update the CI, the tools, the doc, the people, etc). One more thing is about consistency, when you have to manage multiple regions, it's easier if all of them are pretty identical. Human operation are always the same, and can eventually be automated. This leads to keep going on with a fixed version of OpenStack to run the business. When scaling, you (we) always chose security and consistency. Also, Julia mentioned something true about contribution from operators. It's difficult for them for multiple reasons: - pushing upstream is a process, which need to be taken into account when working on an internal fix. - it's usually quicker to push downstream because it's needed. When it comes to upstream, it's challenged by the developers (and it's good), so it take time and can be discouraging. - operators are not running master, but a stable release. Bugs on stables could be fixed differently than on master, which could also be discouraging. - writing unit tests is a job, some tech operators are not necessarily developers, so this could also be a challenge. All of these to say that helping people which are proposing a patch is a good thing. And as far as I can see, upstream developers are helping most of the time, and we should keep and encourage such behavior IMHO. Finally, I would also vote for less releases or LTS releases (but it looks heavier to have this). I think this would help keeping up to date with stables and propose more patches from operators. Cheers, Arnaud. Le 8 novembre 2021 20:43:18 GMT+01:00, Julia Kreger <juliaashleykreger@gmail.com> a écrit :
On Mon, Nov 8, 2021 at 10:44 AM Thierry Carrez <thierry@openstack.org> wrote:
Ghanshyam Mann wrote:
[...] Thanks Thierry for the detailed write up.
At the same time, a shorter release which leads to upgrade-often pressure but it will have fewer number of changes/features, so make the upgrade easy and longer-release model will have more changes/features that will make upgrade more complex.
I think that was true a few years ago, but I'm not convinced that still holds. We currently have a third of the changes volume we had back in 2015, so a one-year release in 2022 would contain far less changes than a 6-month release from 2015.
I concur. Also, in 2015, we were still very much in a "move fast" mode of operation as a community.
Also, thanks to our testing and our focus on stability, the pain linked to the amount of breaking changes in a release is now negligible compared to the basic pain of going through a 1M-core deployment and upgrading the various pieces... every 6 months. I've heard of multiple users claiming it takes them close to 6 months to upgrade their massive deployments to a new version. So when they are done, they have to start again.
-- Thierry Carrez (ttx)
I've been hearing the exact same messaging from larger operators as well as operators in environments where they are concerned about managing risk for at least the past two years. These operators have indicated it is not uncommon for the upgrade projects which consume, test, certify for production, and deploy to production take *at least* six months to execute. At the same time, they are shy of being the ones to also "find all of the bugs", and so the project doesn't actually start until well after the new coordinated release has occurred. Quickly they become yet another version behind with this pattern.
I suspect it is really easy for us as a CI focused community to think that six months is plenty of time to roll out a fully updated deployment which has been fully tested in every possible way. Except, these operators are often trying to do just that on physical hardware, with updated firmware and operatings systems bringing in new variables with every single change which may ripple up the entire stack. These operators then have to apply the lessons they have previously learned once they have worked through all of the variables. In some cases this may involve aspects such as benchmarking, to ensure they don't need to make additional changes which need to be factored into their deployment, sending them back to the start of their testing. All while thinking of phrases like "business/mission critical".
I guess this means I'm in support of revising the release cycle. At the same time, I think it would be wise for us to see if we can learn from these operators the pain points they experience, the process they leverage, and ultimately see if there are opportunities to spread knowledge or potentially tooling. Or maybe even get them to contribute their patches upstream. Not that all of these issues are easily solved with any level of code, but sometimes they can include contextual disconnects and resolving those are just as important as shipping a release, IMHO.
-Julia
Hi, It's time again to discuss the release cycle... Just considering the number of times that lately we have been discussing the release cycle we should acknowledge that we really have a problem or at least that we have very different opinions in the community and we should discuss it openly. Thanks Thierry to bring the topic again. Looking into the last user survey we see that 23% of the deployments are running the last two releases and then we have a long... long... tail with older releases. Honestly, I have mixed feelings about it! As an operator I relate more with having a LTS release and give the possibility to upgrade between LTS releases. But having the possibility to upgrade every 6 months is also very interesting for the small and fast moving projects. Maybe an 1 year release cycle would provide the mid term here. In our cloud infrastructure we run different releases, from Stein to Victoria. There are projects that we can easily upgrade (and we do it!) and other projects that are much more complicated (because feature deprecations, Operating System dependencies, internal patches, or simply because is too risky considering the current workloads). For those we need definitely more than 6 months for the upgrade. If again we don't reach a consensus to change the release cycle at least we should continue to work in improving the upgrade experience (and don't let me wrong... the upgrade experience has been improved tremendously over the years). There are small things that change in the projects (most of them are good refactors) but can be a big headache for upgrades. Let me enumerate some: DB schema changes usually translates into offline upgrades, configuration changes (options that move to different configuration groups without bringing anything new, change defaults, policy changes), architecture changes (new projects that are now mandatory), ... In my opinion if we reduce those or at least are more aware of the challenges that they impose to operators, we will make upgrades easier and hopefully see deployments move much faster whatever is the release cycle. cheers, Belmiro On Tue, Nov 9, 2021 at 12:04 AM Arnaud <arnaud.morin@gmail.com> wrote:
Hey, I'd like to add my 2 cents.
It's hard to upgrade a region, so when it comes to upgrade multiples regions, it's even harder.
Some operators also have their own downstream patchs / extensions / drivers which make the upgrade process more complex, so it take more time (for all reasons already given in the thread, need to update the CI, the tools, the doc, the people, etc).
One more thing is about consistency, when you have to manage multiple regions, it's easier if all of them are pretty identical. Human operation are always the same, and can eventually be automated. This leads to keep going on with a fixed version of OpenStack to run the business. When scaling, you (we) always chose security and consistency.
Also, Julia mentioned something true about contribution from operators. It's difficult for them for multiple reasons: - pushing upstream is a process, which need to be taken into account when working on an internal fix. - it's usually quicker to push downstream because it's needed. When it comes to upstream, it's challenged by the developers (and it's good), so it take time and can be discouraging. - operators are not running master, but a stable release. Bugs on stables could be fixed differently than on master, which could also be discouraging. - writing unit tests is a job, some tech operators are not necessarily developers, so this could also be a challenge.
All of these to say that helping people which are proposing a patch is a good thing. And as far as I can see, upstream developers are helping most of the time, and we should keep and encourage such behavior IMHO.
Finally, I would also vote for less releases or LTS releases (but it looks heavier to have this). I think this would help keeping up to date with stables and propose more patches from operators.
Cheers, Arnaud.
Le 8 novembre 2021 20:43:18 GMT+01:00, Julia Kreger < juliaashleykreger@gmail.com> a écrit :
On Mon, Nov 8, 2021 at 10:44 AM Thierry Carrez <thierry@openstack.org> wrote:
Ghanshyam Mann wrote:
[...] Thanks Thierry for the detailed write up.
At the same time, a shorter release which leads to upgrade-often pressure but it will have fewer number of changes/features, so make the upgrade easy and longer-release model will have more changes/features that will make upgrade more complex.
I think that was true a few years ago, but I'm not convinced that still holds. We currently have a third of the changes volume we had back in 2015, so a one-year release in 2022 would contain far less changes than a 6-month release from 2015.
I concur. Also, in 2015, we were still very much in a "move fast" mode of operation as a community.
Also, thanks to our testing and our focus on stability, the pain linked
to the amount of breaking changes in a release is now negligible compared to the basic pain of going through a 1M-core deployment and upgrading the various pieces... every 6 months. I've heard of multiple users claiming it takes them close to 6 months to upgrade their massive deployments to a new version. So when they are done, they have to start again.
-- Thierry Carrez (ttx)
I've been hearing the exact same messaging from larger operators as well as operators in environments where they are concerned about managing risk for at least the past two years. These operators have indicated it is not uncommon for the upgrade projects which consume, test, certify for production, and deploy to production take *at least* six months to execute. At the same time, they are shy of being the ones to also "find all of the bugs", and so the project doesn't actually start until well after the new coordinated release has occurred. Quickly they become yet another version behind with this pattern.
I suspect it is really easy for us as a CI focused community to think that six months is plenty of time to roll out a fully updated deployment which has been fully tested in every possible way. Except, these operators are often trying to do just that on physical hardware, with updated firmware and operatings systems bringing in new variables with every single change which may ripple up the entire stack. These operators then have to apply the lessons they have previously learned once they have worked through all of the variables. In some cases this may involve aspects such as benchmarking, to ensure they don't need to make additional changes which need to be factored into their deployment, sending them back to the start of their testing. All while thinking of phrases like "business/mission critical".
I guess this means I'm in support of revising the release cycle. At the same time, I think it would be wise for us to see if we can learn from these operators the pain points they experience, the process they leverage, and ultimately see if there are opportunities to spread knowledge or potentially tooling. Or maybe even get them to contribute their patches upstream. Not that all of these issues are easily solved with any level of code, but sometimes they can include contextual disconnects and resolving those are just as important as shipping a release, IMHO.
-Julia
On Tue, Nov 9, 2021 at 6:50 AM Belmiro Moreira <moreira.belmiro.email.lists@gmail.com> wrote:
Hi, It's time again to discuss the release cycle... Just considering the number of times that lately we have been discussing the release cycle we should acknowledge that we really have a problem or at least that we have very different opinions in the community and we should discuss it openly.
Thanks Thierry to bring the topic again.
Looking into the last user survey we see that 23% of the deployments are running the last two releases and then we have a long... long... tail with older releases.
Honestly, I have mixed feelings about it!
As an operator I relate more with having a LTS release and give the possibility to upgrade between LTS releases. But having the possibility to upgrade every 6 months is also very interesting for the small and fast moving projects.
Maybe an 1 year release cycle would provide the mid term here.
In our cloud infrastructure we run different releases, from Stein to Victoria. There are projects that we can easily upgrade (and we do it!) and other projects that are much more complicated (because feature deprecations, Operating System dependencies, internal patches, or simply because is too risky considering the current workloads). For those we need definitely more than 6 months for the upgrade.
If again we don't reach a consensus to change the release cycle at least we should continue to work in improving the upgrade experience (and don't let me wrong... the upgrade experience has been improved tremendously over the years).
There are small things that change in the projects (most of them are good refactors) but can be a big headache for upgrades. Let me enumerate some: DB schema changes usually translates into offline upgrades, configuration changes (options that move to different configuration groups without bringing anything new, change defaults, policy changes), architecture changes (new projects that are now mandatory), ...
This is the kind of contextual reminder that needs to come up frequently. Is there any chance of conveying how long the outages are with a deployment size in your experience, with your level of risk tolerance. Same goes for human/operational impact of working through aspects like configuration options changing/moving, policy changes, architectural changes, new projects being mandatory? My hope is that we convey some sense of "what it really takes" to help provide context in which contributors making changes understand how, at least at a high level, their changes may impact others.
In my opinion if we reduce those or at least are more aware of the challenges that they impose to operators, we will make upgrades easier and hopefully see deployments move much faster whatever is the release cycle.
cheers, Belmiro
On Tue, Nov 9, 2021 at 12:04 AM Arnaud <arnaud.morin@gmail.com> wrote:
Hey, I'd like to add my 2 cents.
It's hard to upgrade a region, so when it comes to upgrade multiples regions, it's even harder.
Some operators also have their own downstream patchs / extensions / drivers which make the upgrade process more complex, so it take more time (for all reasons already given in the thread, need to update the CI, the tools, the doc, the people, etc).
One more thing is about consistency, when you have to manage multiple regions, it's easier if all of them are pretty identical. Human operation are always the same, and can eventually be automated. This leads to keep going on with a fixed version of OpenStack to run the business. When scaling, you (we) always chose security and consistency.
Also, Julia mentioned something true about contribution from operators. It's difficult for them for multiple reasons: - pushing upstream is a process, which need to be taken into account when working on an internal fix. - it's usually quicker to push downstream because it's needed. When it comes to upstream, it's challenged by the developers (and it's good), so it take time and can be discouraging. - operators are not running master, but a stable release. Bugs on stables could be fixed differently than on master, which could also be discouraging. - writing unit tests is a job, some tech operators are not necessarily developers, so this could also be a challenge.
All of these to say that helping people which are proposing a patch is a good thing. And as far as I can see, upstream developers are helping most of the time, and we should keep and encourage such behavior IMHO.
Finally, I would also vote for less releases or LTS releases (but it looks heavier to have this). I think this would help keeping up to date with stables and propose more patches from operators.
Cheers, Arnaud.
Le 8 novembre 2021 20:43:18 GMT+01:00, Julia Kreger <juliaashleykreger@gmail.com> a écrit :
On Mon, Nov 8, 2021 at 10:44 AM Thierry Carrez <thierry@openstack.org> wrote:
Ghanshyam Mann wrote:
[...] Thanks Thierry for the detailed write up.
At the same time, a shorter release which leads to upgrade-often pressure but it will have fewer number of changes/features, so make the upgrade easy and longer-release model will have more changes/features that will make upgrade more complex.
I think that was true a few years ago, but I'm not convinced that still holds. We currently have a third of the changes volume we had back in 2015, so a one-year release in 2022 would contain far less changes than a 6-month release from 2015.
I concur. Also, in 2015, we were still very much in a "move fast" mode of operation as a community.
Also, thanks to our testing and our focus on stability, the pain linked to the amount of breaking changes in a release is now negligible compared to the basic pain of going through a 1M-core deployment and upgrading the various pieces... every 6 months. I've heard of multiple users claiming it takes them close to 6 months to upgrade their massive deployments to a new version. So when they are done, they have to start again.
-- Thierry Carrez (ttx)
I've been hearing the exact same messaging from larger operators as well as operators in environments where they are concerned about managing risk for at least the past two years. These operators have indicated it is not uncommon for the upgrade projects which consume, test, certify for production, and deploy to production take *at least* six months to execute. At the same time, they are shy of being the ones to also "find all of the bugs", and so the project doesn't actually start until well after the new coordinated release has occurred. Quickly they become yet another version behind with this pattern.
I suspect it is really easy for us as a CI focused community to think that six months is plenty of time to roll out a fully updated deployment which has been fully tested in every possible way. Except, these operators are often trying to do just that on physical hardware, with updated firmware and operatings systems bringing in new variables with every single change which may ripple up the entire stack. These operators then have to apply the lessons they have previously learned once they have worked through all of the variables. In some cases this may involve aspects such as benchmarking, to ensure they don't need to make additional changes which need to be factored into their deployment, sending them back to the start of their testing. All while thinking of phrases like "business/mission critical".
I guess this means I'm in support of revising the release cycle. At the same time, I think it would be wise for us to see if we can learn from these operators the pain points they experience, the process they leverage, and ultimately see if there are opportunities to spread knowledge or potentially tooling. Or maybe even get them to contribute their patches upstream. Not that all of these issues are easily solved with any level of code, but sometimes they can include contextual disconnects and resolving those are just as important as shipping a release, IMHO.
-Julia
- it's usually quicker to push downstream because it's needed. When it comes to upstream, it's challenged by the developers (and it's good), so it take time and can be discouraging.
I'm sure many operators push downstream first, and then chuck a patch into upstream gerrit in hopes of it landing upstream so they don't have to maintain it long-term. Do you think the possibility of it not landing for a year (if they make it in the first one) or two (if it goes into the next one) is a disincentive to pushing upstream? I would think it might push it past the event horizon making downstream patches more of a constant.
- writing unit tests is a job, some tech operators are not necessarily developers, so this could also be a challenge.
Yep, and my experience is that this sort of "picking up the pieces" of good fixes that need help from another developer happens mostly at the end of the release, post-FF in a lot of cases. This is the time when the pressure of the pending release is finally on and we get around to this sort of task. Expanding the release window increases the number of these things collected per cycle, and delays them being in a release by a long time. I know, we should just "do better" for the earlier parts of the cycle, but realistically that won't happen :) --Dan
On 11/9/21 4:17 PM, Dan Smith wrote:
- it's usually quicker to push downstream because it's needed. When it comes to upstream, it's challenged by the developers (and it's good), so it take time and can be discouraging.
I'm sure many operators push downstream first, and then chuck a patch into upstream gerrit in hopes of it landing upstream so they don't have to maintain it long-term. Do you think the possibility of it not landing for a year (if they make it in the first one) or two (if it goes into the next one) is a disincentive to pushing upstream?
Don't ask your colleagues upstream about what we do! :) With my Debian package maintainer hat on: I will continue to send patch upstream whenever I can.
Yep, and my experience is that this sort of "picking up the pieces" of good fixes that need help from another developer happens mostly at the end of the release, post-FF in a lot of cases. This is the time when the pressure of the pending release is finally on and we get around to this sort of task.
It used to be that I was told to add unit tests, open a bug and close it, etc. and if I was not doing it, the patch would just stay open forever. This was the early days of OpenStack...
From packager view, that's not what I experienced. Mostly, upstream OpenStack people are nice, and understand that we (package maintainers) just jump from one package to another, and can't afford more than 15 minutes per package upgrade (considering upgrading to Xena meant upgrading 220 packages...). I've seen numerous times upstream projects taking over one of my patch, finishing the work (sometimes adding unit tests) and make the patch land (sometimes, even backport it to earlier releases).
I don't think switching to a 1 year release cycle will change anything regarding distro <-> upstream relationship. Hopefully, OpenStack people will continue to be awesome and nice to work with... :) Cheers, Thomas Goirand (zigo)
Hello Thierry (and all others), First, thanks for the recap. On Fri, Nov 5, 2021, at 15:26, Thierry Carrez wrote:
The (long) document below reflects the current position of the release management team on a popular question: should the OpenStack release cadence be changed? Please note that we only address the release management / stable branch management facet of the problem. There are other dimensions to take into account (governance, feature deprecation, supported distros...) to get a complete view of the debate.
I think it's time to have a conversation with all the parties to progress forward. Take more than one dimension into account. It would be sad if we can't progress all together.
The main pressure to release more often is to make features available to users faster. Developers get a faster feedback loop, hardware vendors ensure software is compatible with their latest products, and users get exciting new features. "Release early, release often" is a best practice in our industry -- we should generally aim at releasing as often as possible.
My view is that we are in a place where openstack projects are very well tested together nowadays. This test coverage reduce the need of "coordinated releases" with larger testing... to a point that some operators are (al)ready to consume master branch. So, for those in need of the latest features early (in a long term fashion), there are two choices, regardless of release & branching cycle: Stay on that rolling forward branch (master), or manage your own fork of the code. That choice wasn't really possible in the early days of openstack without taking larger risks.
But that is counterbalanced by pressure to release less often. From a development perspective, each release cycle comes with some process overhead. On the integrators side, a new release means packaging and validation work. On the users side, it means pressure to upgrade. To justify that cost, there needs to be enough user-visible benefit (like new features) in a given release.
Very good summary.
For the last 10 years for OpenStack, that balance has been around six months. Six months let us accumulate enough new development that it was worth upgrading to / integrating the new version, while giving enough time to actually do the work. It also aligned well with Foundation events cadence, allowing to synchronize in-person developer meetings date with start of cycles.
I think we're hitting something here.
The major recent change affecting this trade-off is that the pace of new development in OpenStack slowed down. The rhythm of changes was divided by 3 between 2015 and 2021, reflecting that OpenStack is now a mature and stable solution, where accessing the latest features is no longer a major driver. That reduces some of the pressure for releasing more often. At the same time, we have more users every day, with larger and larger deployments, and keeping those clusters constantly up to date is an operational challenge. That increases the pressure to release less often. In essence, OpenStack is becoming much more like a LTS distribution than a web browser -- something users like moving slow.
Over the past years, project teams also increasingly decoupled individual components from the "coordinated release". More and more components opted for an independent or intermediary-released model, where they can put out releases in the middle of a cycle, making new features available to their users. This increasingly opens up the possibility of a longer "coordinated release" which would still allow development teams to follow "release early, release often" best practices. All that recent evolution means it is (again) time to reconsider if the 6-month cadence is what serves our community best, and in particular if a longer release cadence would not suit us better.
Again, thanks to the increase in testability (projects tested together), it could be time for us to step away from the whole model of coordinated release, which is IMO part of this problem. I feel it's okay for a project to release when they are ready/have something to release. What's holding us up to do that? Again, if we stop pushing this "artificial" release model, we'll stop the branching efforts causing overhead work. It's not bringing any value to the ecosystem anymore.
While releasing less often would definitely reduce the load on the release management team, most of the team work being automated, we do not think it should be a major factor in motivating the decision. We should not adjust the cadence too often though, as there is a one-time cost in switching our processes. In terms of impact, we expect that a switch to a longer cycle will encourage more project teams to adopt a "with-intermediary" release model (rather than the traditional "with-rc" single release per cycle), which may lead to abandoning the latter, hence simplifying our processes. Longer cycles might also discourage people to commit to PTL or release liaison work. We'd probably need to manage expectations there, and encourage more frequent switches (or create alternate models).
I feel it's okay to reduce the cadence of a 'coordinated release' to a year, from a consumer perspective. However, I think it's not the right path forward _without other changes_ (see my comment above, and the reduction of the amount of branches). If the release work didn't change from when I was still in the team, having a longer cycle means more patches to review inside a single release. Of course, less activity in OpenStack have a good counter-balance effect in here. I just believe it's better to release _more often_ than not. But _branching_ should be reduced as much as possible, that's the costly part (to what I have seen, tell me if I am wrong). I don't see any value in making longer releases for the sake of it. I don't see the reason of multiplying the amount of branches and upgrade paths to maintain.
If the decision is made to switch to a longer cycle, the release management team recommends to switch to one year directly. That would avoid changing it again anytime soon, and synchronizing on a calendar year is much simpler to follow and communicate. We also recommend announcing the change well in advance. We currently have an opportunity of making the switch when we reach the end of the release naming alphabet, which would also greatly simplify the communications around the change.
Wouldn't it be easier to completely reduce the branching, and branching only when necessary, and let projects branch when they need to? If we define strict rules for branching (and limit the annoying bits for the consumers), it will increase the quality of the ecosystem IMO. It will also be easier to manage from a packager perspective. Next to that indeed, a "coordinated release" once a year sounds a good idea, for our users ("I am using OpenStack edition 2021").
Finally, it is worth mentioning the impact on the stable branch work. Releasing less often would likely impact the number of stable branches that we keep on maintaining, so that we do not go too much in the past (and hit unmaintained distributions or long-gone dependencies). We currently maintain releases for 18 months before they switch to extended maintenance, which results in between 3 and 4 releases being maintained at the same time. We'd recommend switching to maintaining one-year releases for 24 months, which would result in between 2 and 3 releases being maintained at the same time. Such a change would lead to longer maintenance for our users while reducing backporting work for our developers.
With people churn, the work will be even harder to maintain. I think however it's delaying the problem: We are not _fixing_ the base need. Managing upgrades of 2/3 releases of a complete openstack stack of projects would be an increased effort for maintainers, just done less frequently. For maintainers, it makes more sense to phase work organically, based on project needs. If you are thinking distros, having to manage all the work when a release is out is far more coordination than if things were released over time. My experience at SUSE was that the branching model is even debatable: It was more work, and after all, we were taking the code we wanted, and put our patches on top if those didn't make upstream/weren't backported on time for x reasons (valid or not ;)). So basically, for me, the stable branches have very little value nowdays from the community perspective (it would be good enough if everybody is fixing master, IMO). I am not sure I am the only one seeing it that way. I still feel it's worth documenting.
From the "refstack" (or whatever it's called now) perspective, an 'OpenStack Powered Platform xx' is still possible with this model. We need to define a yearly baseline of the versions of the software we expect, the APIs that those software expose, and the testing around them. No need for branching, "release often" still work, projects are autonomous/owner of their destiny, and we keep the coordination.
Sorry for the long post for only my $0.02 ;) Regards, Jean-Philippe Evrard (evrardjp)
On 2021-11-29 13:21:52 +0100 (+0100), Jean-Philippe Evrard wrote: [...]
My experience at SUSE was that the branching model is even debatable: It was more work, and after all, we were taking the code we wanted, and put our patches on top if those didn't make upstream/weren't backported on time for x reasons (valid or not ;)). So basically, for me, the stable branches have very little value nowdays from the community perspective (it would be good enough if everybody is fixing master, IMO). [...]
The primary reason stable branches exist is to make it easier for us to test and publish backports of critical patches to older versions of the software, rather than expecting our downstream consumers to do that work themselves. If you're saying distribution package maintainers are going to do it anyway and ignore our published backports, then dropping the branching model may make sense, but I've seen evidence to suggest that at least some distros do consume our backports directly. -- Jeremy Stanley
On Mon, Nov 29, 2021, at 14:09, Jeremy Stanley wrote:
The primary reason stable branches exist is to make it easier for us to test and publish backports of critical patches to older versions of the software, rather than expecting our downstream consumers to do that work themselves. If you're saying distribution package maintainers are going to do it anyway and ignore our published backports, then dropping the branching model may make sense, but I've seen evidence to suggest that at least some distros do consume our backports directly.
Don't get me wrong, SUSE is consuming those backports, and (at least was) contributing to them. And yes, I doubt that RH/SUSE/Canonical are simply consuming those packages without ever adding their patches on a case by case basis. So yes, those distros are already doing part of their work downstream (and/or upstream). And for a valid reason: it's part of their job :) Doesn't mean we, as a whole community, still need to cut the work for every single consumer. If we are stretched thin, we need to define priorities. I believe our aggressive policy in terms of branching is hurting the rest of the ecosystem, that's why I needed to say things out loud. I meant the less we branch, the less we backport, the less painful upgrades we have to deal with. It depends on our definition of _when to branch_ of course. Your example of a "critical patch" might be a good reason to branch. We are maybe in a place where this can be on a case by case basis, or that we should improve that definition? Regards, JP
On Mon, 2021-11-29 at 14:43 +0100, Jean-Philippe Evrard wrote:
On Mon, Nov 29, 2021, at 14:09, Jeremy Stanley wrote:
The primary reason stable branches exist is to make it easier for us to test and publish backports of critical patches to older versions of the software, rather than expecting our downstream consumers to do that work themselves. If you're saying distribution package maintainers are going to do it anyway and ignore our published backports, then dropping the branching model may make sense, but I've seen evidence to suggest that at least some distros do consume our backports directly.
Don't get me wrong, SUSE is consuming those backports, and (at least was) contributing to them. And yes, I doubt that RH/SUSE/Canonical are simply consuming those packages without ever adding their patches on a case by case basis. So yes, those distros are already doing part of their work downstream (and/or upstream). And for a valid reason: it's part of their job :)
Doesn't mean we, as a whole community, still need to cut the work for every single consumer. If we are stretched thin, we need to define priorities.
I believe our aggressive policy in terms of branching is hurting the rest of the ecosystem, that's why I needed to say things out loud. I meant the less we branch, the less we backport, the less painful upgrades we have to deal with. It depends on our definition of _when to branch_ of course. Your example of a "critical patch" might be a good reason to branch. We are maybe in a place where this can be on a case by case basis, or that we should improve that definition? i actully would not consier our branching agressive. i actully think its much longer then the rest of the indesty or ecosystem. many of the project we consume release on a monthly or quterly basis with some provideing api/abi breaking release once a year. e.g. dpdk only allows abi breaks in the q4 release i belive every year, the kernel select an lts branch to maintain every year basedon the last release of the year but as a ~6 week release schdule. i stonely belive it would be healthier for use to release more often then we do. that does not mean i think we should break version compatiably for our distibuted project more often.
if we released every 6 weeks or once a quater but limited the end user impact of that so that they could mix/match release for up to a year or two that would be better for our consumers, downstream distirbutionbs and devleopers. developers would have to backport less, downstream could ship/rebase on any of the intermidary releases without breaking comparitblet to get features and consume could stay on the stable release and only upgrade once a year or opt for one of the point release in a year for new features. honestly i cant see how release less often will have any effect other then slowing the deleiver of featrue and bug fixes to customers. i dont think it will help distrobutions reduce there maintance since we will just spend more time backproting features due to the increased wait time that some of our custoemr will not want to wait for. we still recive feature backport request for our OSP 16 product based on train (sometime 10 and 13 based on newton/qeens respcitivly) if i tell our large telco customer "ok the we are two far into Yoga to complete that this cycle so the next upstream release we can target this for is Z and that will release in Q4 2022 and it will take a year for use to productize that relase so you can expect the feature to be completed in 2023" They will respond with "Well we need it in 2022 so what the ETA for a backport". if we go from a 6month upstream cycle to a 12 month one that converation chagnes form we can deliver in 9 months to 18 upstream + packaging. shure with a 12 cycle there is more likely hood we could fit it into the current cycle but it also much much more painful if we miss a release. there are only a small subset of feature that we can backport downstream without breaking interoperablity so we assume we can fall back on that. stable branches dont nessiarly help with that but not having a very long release cadnce does.
Regards, JP
---- On Mon, 29 Nov 2021 08:22:31 -0600 Sean Mooney <smooney@redhat.com> wrote ----
On Mon, 2021-11-29 at 14:43 +0100, Jean-Philippe Evrard wrote:
On Mon, Nov 29, 2021, at 14:09, Jeremy Stanley wrote:
The primary reason stable branches exist is to make it easier for us to test and publish backports of critical patches to older versions of the software, rather than expecting our downstream consumers to do that work themselves. If you're saying distribution package maintainers are going to do it anyway and ignore our published backports, then dropping the branching model may make sense, but I've seen evidence to suggest that at least some distros do consume our backports directly.
Don't get me wrong, SUSE is consuming those backports, and (at least was) contributing to them. And yes, I doubt that RH/SUSE/Canonical are simply consuming those packages without ever adding their patches on a case by case basis. So yes, those distros are already doing part of their work downstream (and/or upstream). And for a valid reason: it's part of their job :)
Doesn't mean we, as a whole community, still need to cut the work for every single consumer. If we are stretched thin, we need to define priorities.
I believe our aggressive policy in terms of branching is hurting the rest of the ecosystem, that's why I needed to say things out loud. I meant the less we branch, the less we backport, the less painful upgrades we have to deal with. It depends on our definition of _when to branch_ of course. Your example of a "critical patch" might be a good reason to branch. We are maybe in a place where this can be on a case by case basis, or that we should improve that definition? i actully would not consier our branching agressive. i actully think its much longer then the rest of the indesty or ecosystem. many of the project we consume release on a monthly or quterly basis with some provideing api/abi breaking release once a year. e.g. dpdk only allows abi breaks in the q4 release i belive every year, the kernel select an lts branch to maintain every year basedon the last release of the year but as a ~6 week release schdule. i stonely belive it would be healthier for use to release more often then we do. that does not mean i think we should break version compatiably for our distibuted project more often.
if we released every 6 weeks or once a quater but limited the end user impact of that so that they could mix/match release for up to a year or two that would be better for our consumers, downstream distirbutionbs and devleopers.
developers would have to backport less, downstream could ship/rebase on any of the intermidary releases without breaking comparitblet to get features and consume could stay on the stable release and only upgrade once a year or opt for one of the point release in a year for new features.
honestly i cant see how release less often will have any effect other then slowing the deleiver of featrue and bug fixes to customers. i dont think it will help distrobutions reduce there maintance since we will just spend more time backproting features due to the increased wait time that some of our custoemr will not want to wait for. we still recive feature backport request for our OSP 16 product based on train (sometime 10 and 13 based on newton/qeens respcitivly)
if i tell our large telco customer "ok the we are two far into Yoga to complete that this cycle so the next upstream release we can target this for is Z and that will release in Q4 2022 and it will take a year for use to productize that relase so you can expect the feature to be completed in 2023" They will respond with "Well we need it in 2022 so what the ETA for a backport". if we go from a 6month upstream cycle to a 12 month one that converation chagnes form we can deliver in 9 months to 18 upstream + packaging. shure with a 12 cycle there is more likely hood we could fit it into the current cycle but it also much much more painful if we miss a release. there are only a small subset of feature that we can backport downstream without breaking interoperablity so we assume we can fall back on that.
This is an important point and I think I mentioned it in one of my replies but you have put it nicely. Slowing the pace of getting features released/available to users will directly hurt many OpenStack consumers. I know a similar case from telco where 6-month waiting time is ok for them but not 1 year (we asked if it is ok to deliver this feature upstream in the Z cycle and they said no that is too late). I think there might be many such cases. And people might start doing 'downstream more' due to upstream late availability of features. 2nd impact I see here is on contributors which are always my concern in many forms :). We have fewer contributors now a days in upstream, making 1-year release can impact their upstream role from their company side either to implement/ get-releases any feature or contributions help in multiple areas. That is just my thought but anyone managing the upstream contributors budget for their company can validate it. -gmann
stable branches dont nessiarly help with that but not having a very long release cadnce does.
Regards, JP
On Tue, Nov 30, 2021 at 1:13 PM Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
---- On Mon, 29 Nov 2021 08:22:31 -0600 Sean Mooney <smooney@redhat.com> wrote ----
On Mon, 2021-11-29 at 14:43 +0100, Jean-Philippe Evrard wrote:
On Mon, Nov 29, 2021, at 14:09, Jeremy Stanley wrote:
The primary reason stable branches exist is to make it easier for us to test and publish backports of critical patches to older versions of the software, rather than expecting our downstream consumers to do that work themselves. If you're saying distribution package maintainers are going to do it anyway and ignore our published backports, then dropping the branching model may make sense, but I've seen evidence to suggest that at least some distros do consume our backports directly.
Don't get me wrong, SUSE is consuming those backports, and (at least was) contributing to them. And yes, I doubt that RH/SUSE/Canonical are simply consuming those packages without ever adding their patches on a case by case basis. So yes, those distros are already doing part of their work downstream (and/or upstream). And for a valid reason: it's part of their job :)
Doesn't mean we, as a whole community, still need to cut the work for every single consumer. If we are stretched thin, we need to define priorities.
I believe our aggressive policy in terms of branching is hurting the rest of the ecosystem, that's why I needed to say things out loud. I meant the less we branch, the less we backport, the less painful upgrades we have to deal with. It depends on our definition of _when to branch_ of course. Your example of a "critical patch" might be a good reason to branch. We are maybe in a place where this can be on a case by case basis, or that we should improve that definition? i actully would not consier our branching agressive. i actully think its much longer then the rest of the indesty or ecosystem. many of the project we consume release on a monthly or quterly basis with some provideing api/abi breaking release once a year. e.g. dpdk only allows abi breaks in the q4 release i belive every year, the kernel select an lts branch to maintain every year basedon the last release of the year but as a ~6 week release schdule. i stonely belive it would be healthier for use to release more often then we do. that does not mean i think we should break version compatiably for our distibuted project more often.
if we released every 6 weeks or once a quater but limited the end user impact of that so that they could mix/match release for up to a year or two that would be better for our consumers, downstream distirbutionbs and devleopers.
developers would have to backport less, downstream could ship/rebase on any of the intermidary releases without breaking comparitblet to get features and consume could stay on the stable release and only upgrade once a year or opt for one of the point release in a year for new features.
honestly i cant see how release less often will have any effect other then slowing the deleiver of featrue and bug fixes to customers. i dont think it will help distrobutions reduce there maintance since we will just spend more time backproting features due to the increased wait time that some of our custoemr will not want to wait for. we still recive feature backport request for our OSP 16 product based on train (sometime 10 and 13 based on newton/qeens respcitivly)
if i tell our large telco customer "ok the we are two far into Yoga to complete that this cycle so the next upstream release we can target this for is Z and that will release in Q4 2022 and it will take a year for use to productize that relase so you can expect the feature to be completed in 2023" They will respond with "Well we need it in 2022 so what the ETA for a backport". if we go from a 6month upstream cycle to a 12 month one that converation chagnes form we can deliver in 9 months to 18 upstream + packaging. shure with a 12 cycle there is more likely hood we could fit it into the current cycle but it also much much more painful if we miss a release. there are only a small subset of feature that we can backport downstream without breaking interoperablity so we assume we can fall back on that.
This is an important point and I think I mentioned it in one of my replies but you have put it nicely. Slowing the pace of getting features released/available to users will directly hurt many OpenStack consumers. I know a similar case from telco where 6-month waiting time is ok for them but not 1 year (we asked if it is ok to deliver this feature upstream in the Z cycle and they said no that is too late). I think there might be many such cases. And people might start doing 'downstream more' due to upstream late availability of features.
And yet, they won't be able to get the software into the field for just as long, and demands/requirements for feature backports substantially increases vendor production cost by increasing upgrade risk, elongating qualification testing/cycles, and ultimately trickling down to the end operator who now has to wait even longer. It feels like this is a giant "no win situation", or for you star trek fans, the Kobayashi Maru.
2nd impact I see here is on contributors which are always my concern in many forms :). We have fewer contributors now a days in upstream, making 1-year release can impact their upstream role from their company side either to implement/ get-releases any feature or contributions help in multiple areas. That is just my thought but anyone managing the upstream contributors budget for their company can validate it.
-gmann
stable branches dont nessiarly help with that but not having a very long release cadnce does.
Regards, JP
On Tue, Nov 30, 2021 at 2:12 PM Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2021-11-30 13:48:15 -0800 (-0800), Julia Kreger wrote: [...]
It feels like this is a giant "no win situation", or for you star trek fans, the Kobayashi Maru. [...]
So, hacker-Kirk will save us all through cheating? Sounds legit. -- Jeremy Stanley
It feels like we have created a bunch of basically immovable, insurmountable conflicting obstacles. Kind of like a self digging holes. I'm worried not even hacker-Kirk can save us. Well, maybe his answer might actually be to abolish the integrated release so he can not only rescue the operators on the ship, but also beam them the tools they need to move forward. Granted, that is change, and human nature is a thing. :(
Hello, On Tue, Nov 30, 2021, at 23:31, Julia Kreger wrote:
It feels like this is a giant "no win situation", It feels like we have created a bunch of basically immovable, insurmountable conflicting obstacles. Kind of like a self digging holes. I'm worried not even hacker-Kirk can save us. Well, maybe his answer might actually be to abolish the integrated release so he can not only rescue the operators on the ship, but also beam them the tools they need to move forward. Granted, that is change, and human nature is a thing. :(
Well, I feel completely differently. For me, people are using different words, and are in agreement in some points. Or maybe I am reading this wrong? Here is what I read: 1) Many want more releases, not less. I haven't seen a complaint about tagging more releases. 2) More than one person is proposing to abandon the integrated release, and nobody has complained about it. 3) Many people seem eager to carry "stable branches" for "critical patches", but no new definition of such criticality was done. 4) Many people want to make sure it's easy to upgrade, and with less steps for operations. I don't see any conflicts, just areas for improvement, for those who have been participating on this topic. Can someone clarify if I have tunnel vision/bias (as it seems exactly what I proposed in my first answer)? Thank you in advance. Regards, Jean-Philippe Evrard (evrardjp)
Jean-Philippe Evrard wrote:
[...] Here is what I read: 1) Many want more releases, not less. I haven't seen a complaint about tagging more releases. 2) More than one person is proposing to abandon the integrated release, and nobody has complained about it.
Here are a few drawbacks of abandoning the integrated release (which is really a "synchronized release"): - You would no longer have a baseline of components that are heavily tested together which we do community QA on and (collectively) clearly sign off on, so downstream integrators are on their own and have much more work to do - You would no longer have clearly-comparable points between distributions. Everyone knows what "running Ussuri" means, which facilitates communication and bug report handling in the community. - You would no longer have clear community support commitments. We currently maintain and fix bug reports from people running vanilla "Ussuri"... but do we want to care about every combination of components under the sun? (maybe we do already) - You would no longer have "OpenStack" released, so you miss the regular marketing opportunity to remind the rest of the world that it still exists. The OpenStack brand fades, and it gets more complicated to get development resources to work on it. Without the synchronized release, OpenStack essentially becomes a rolling distribution of cloud components on which we make very limited guarantees. I guess it is suitable to build maintained distributions on, but it really is no longer directly usable beyond development. Is that what we want?
3) Many people seem eager to carry "stable branches" for "critical patches", but no new definition of such criticality was done.
Note that a common stable branch cut is *exactly* the same thing as a synchronized release... So I see 2 and 3 as being opposed views. -- Thierry Carrez (ttx)
Hello, On Sat, Dec 4, 2021, at 15:56, Thierry Carrez wrote:
Jean-Philippe Evrard wrote:
[...] Here is what I read: 1) Many want more releases, not less. I haven't seen a complaint about tagging more releases. 2) More than one person is proposing to abandon the integrated release, and nobody has complained about it.
Here are a few drawbacks of abandoning the integrated release (which is really a "synchronized release"):
- You would no longer have a baseline of components that are heavily tested together which we do community QA on and (collectively) clearly sign off on, so downstream integrators are on their own and have much more work to do
I don't think so. We can still have that without the synchronized release. We decide what we test together. (Maybe the TC can define that?). For example, with Zuul, we can test the master branch of all projects together to ensure the latest branch always work according to the set criteria. (No change there!) Then, what becomes "openstack version x" is just about a manifest of the SHAs or the tags of the projects, tested together. (No change again). (Let's call this proposition 1) We don't have multiple "synchronized" releases of stable branches, so we don't even have to bring any "openstack version x.y". Still, it doesn't prevent us to define any "openstack version x.y" if we want to, with an updated manifest compared to "version x.0" Technically proposition 1 looks like we run a deploy tool + tempest before release, based on the manifest. It's literally no new technology. I am relatively sure many (if not all) deployment tools can do that. What makes more sense for me is that we define "openstack version x" in terms of APIs, and that we have the test tooling to ensure that software wanted to be integrated into "version x" are passing said tests. (Let's call this proposition 2) It allows alternative implementation of the software, if someone is crazy enough to do that. I agree it might look like "We are abandonning the branches, what will we do for people deploying regularily on a single branch?". Well it doesn't change much: Those are currently consuming the manifests of releases, bumping the SHAs from a "stable" branch of a repo. If a project/repo _decides to branch_ for a very critical reason, then it can still happen. If there is no reason to branch, then you would still bump the latest branch available. Still no issue there.
- You would no longer have clearly-comparable points between distributions. Everyone knows what "running Ussuri" means, which facilitates communication and bug report handling in the community.
I am not sure what "running Ussuri" means. Is that when the branch was cut, or the latest sha of all the projects' Ussuri branches? Having "OpenStack version Ussuri 1" corresponding to a manifest of project SHAs and or APIs versions is far more clear to me.
- You would no longer have clear community support commitments. We currently maintain and fix bug reports from people running vanilla "Ussuri"... but do we want to care about every combination of components under the sun? (maybe we do already)
We don't stop maintainers for contributing by being clearer in what we release and how we branch... If people want to maintain some old version and _need to branch a project_, I feel it's still possible. But it's now in the power of the project to decide whether it makes sense to do so, instead of being forced to manage something that might be stretching the teams thin.
- You would no longer have "OpenStack" released, so you miss the regular marketing opportunity to remind the rest of the world that it still exists. The OpenStack brand fades, and it gets more complicated to get development resources to work on it.
Again, it's wording. Please see my proposal above. I understand why it's a concern for the foundation however ;)
Without the synchronized release, OpenStack essentially becomes a rolling distribution of cloud components on which we make very limited guarantees. I guess it is suitable to build maintained distributions on, but it really is no longer directly usable beyond development. Is that what we want?
We would indeed be more "rolling", but it doesn't prevent tagging/point in time testing and quality assurance. Also, if we have integrated testing during the cycle and before the tagging, then it doesn't remove any guarantee, does it? I disagree it's not usable beyond development. Here is my reasoning: 1) I don't see it as removing any guarantee if we test things out :) 2) Users of openstack are most likely using a deployment tooling (based on configuration management tool of choice) to deploy their clouds, not a manual deployment. This seems to be confirmed by the user survey. Do you mean that a change in the model of branching would irreversibly break all those tools, and make the ecosystem "no longer directly usable beyond development"? Keep in mind I see those deploy toolings as part of openstack. Not only because they are part of the ecosystem, but because some are under OpenStack governance (OpenStack charms, openstack-chef, puppet openstack, osa, openstack-helm, kolla, tripleO). Hence I consider that OpenStack would still be usable beyond development. I know at least a company that would continue to believe in OpenStack ;)
3) Many people seem eager to carry "stable branches" for "critical patches", but no new definition of such criticality was done.
Note that a common stable branch cut is *exactly* the same thing as a synchronized release... So I see 2 and 3 as being opposed views.
I agree with you there on common stable branch = synchronized release, especially if "criticality" = "Whenever we decide to cut a release". I still wanted to mention what was said, for the reader. It doesn't mean that everyone agree on the definition of criticality yet, or maybe I am wrong? ;) Next to this, I have questions: A) Am I the only one wanting to act on this? B) Am I the only one caring? C) What should be the next steps if I want to change this model? D) Should I propose a change in governance, sync with release management team? As this goes deep into the foundation's model, I feel like we need a proper coordination to make this happen. E) Do you consider all the energy spent to change things does not bring enough positive reward for the whole ecosystem? I see people talking about changing releases for years now, I haven't seen a single change in our behaviour. (or maybe I missed something?). Is that stockholm syndrome? ;) JP
Jean-Philippe Evrard wrote:
On Sat, Dec 4, 2021, at 15:56, Thierry Carrez wrote:
Jean-Philippe Evrard wrote:
[...] Here is what I read: 1) Many want more releases, not less. I haven't seen a complaint about tagging more releases. 2) More than one person is proposing to abandon the integrated release, and nobody has complained about it.
Here are a few drawbacks of abandoning the integrated release (which is really a "synchronized release"):
- You would no longer have a baseline of components that are heavily tested together which we do community QA on and (collectively) clearly sign off on, so downstream integrators are on their own and have much more work to do
I don't think so. We can still have that without the synchronized release. We decide what we test together. (Maybe the TC can define that?). For example, with Zuul, we can test the master branch of all projects together to ensure the latest branch always work according to the set criteria. (No change there!)
Then, what becomes "openstack version x" is just about a manifest of the SHAs or the tags of the projects, tested together. (No change again). (Let's call this proposition 1)
I'm still wrapping my head around your proposal... It appears you want to drop the "stable branch" part of the release rather than the regular tagging and tracking of what has been tested together (the "integrated release"). It definitely sounds possible to me to: - have all components use an intermediary-released model (release as needed, no common feature freeze, no RCs) - have regular points in time where we communicate combinations of components that work together - not create stable branches for those components, not backport any bugfix and just roll forward (single branch development) Then you would still have a comparison baseline ("I run X"), and you would still have "openstack releases" that you can communicate and generate excitement around. And it would definitely reduce the backporting work happening upstream. However I am not sure I see how that proposal solves the upgrade pressure, or make downstream distribution work any easier... which are the two major reasons people ask for a longer cycle in the current system. If anything, that would make both worse, no? -- Thierry Carrez (ttx)
Hello, On 8 Dec 2021, at 12:16, Thierry Carrez <thierry@openstack.org> wrote:
I'm still wrapping my head around your proposal... It appears you want to drop the "stable branch" part of the release rather than the regular tagging and tracking of what has been tested together (the "integrated release").
It definitely sounds possible to me to: - have all components use an intermediary-released model (release as needed, no common feature freeze, no RCs) - have regular points in time where we communicate combinations of components that work together - not create stable branches for those components, not backport any bugfix and just roll forward (single branch development)
That’s what I am proposing indeed. I have added the exception that _projects_ can decide to go multi branch, where it makes sense. The TC decides what warrants a branch, to have a common behaviour across the whole openstack (see examples below).
Then you would still have a comparison baseline ("I run X"), and you would still have "openstack releases" that you can communicate and generate excitement around. And it would definitely reduce the backporting work happening upstream.
Correct.
However I am not sure I see how that proposal solves the upgrade pressure, or make downstream distribution work any easier... which are the two major reasons people ask for a longer cycle in the current system. If anything, that would make both worse, no?
1) Small backports don’t have to happen until a big, disruptive, work happens. This relieves a bit of pressure on contributions, and make the contributions more meaningful tbh. (cf. the "oh it wasn't backported to this branch" syndrome ;)). 2) When a new branch is created, the _project_ could decide what to do with the branch. “We will abandon this legacy code in x”. The TC might decide a series of rules of when to close branches (or if it’s even allowed) 3) Downstream distributions would still get maintained software, and in fact it would be easier to understand. e.g. for packagers: The master branch rolls forward, and has regular tags (intermediary-released), all of them are not disruptive for users/packaging. If a project decides to branch (because they brought a very disruptive change), then it could be considered for packagers as creating a “new version” of a package, obsoleting an older package. Very simple for packagers AND users. 4) The result of a strong “let’s never branch” model is that we can have easier upgrades: We can decide that the upgrades between two version of the same branch have to be seemless, while a switch from an older branch to a newer branch must require a “migration” tool in place. Regards, Jean-Philippe Evrard (evrardjp)
On Tue, Dec 7, 2021 at 4:42 PM Jean-Philippe Evrard <openstack@a.spamming.party> wrote:
Hello,
On Sat, Dec 4, 2021, at 15:56, Thierry Carrez wrote:
Jean-Philippe Evrard wrote:
[...] Here is what I read: 1) Many want more releases, not less. I haven't seen a complaint about tagging more releases. 2) More than one person is proposing to abandon the integrated release, and nobody has complained about it.
Here are a few drawbacks of abandoning the integrated release (which is really a "synchronized release"):
- You would no longer have a baseline of components that are heavily tested together which we do community QA on and (collectively) clearly sign off on, so downstream integrators are on their own and have much more work to do
I don't think so. We can still have that without the synchronized release. We decide what we test together. (Maybe the TC can define that?). For example, with Zuul, we can test the master branch of all projects together to ensure the latest branch always work according to the set criteria. (No change there!)
Then, what becomes "openstack version x" is just about a manifest of the SHAs or the tags of the projects, tested together. (No change again). (Let's call this proposition 1)
We don't have multiple "synchronized" releases of stable branches, so we don't even have to bring any "openstack version x.y". Still, it doesn't prevent us to define any "openstack version x.y" if we want to, with an updated manifest compared to "version x.0"
Technically proposition 1 looks like we run a deploy tool + tempest before release, based on the manifest. It's literally no new technology. I am relatively sure many (if not all) deployment tools can do that.
What makes more sense for me is that we define "openstack version x" in terms of APIs, and that we have the test tooling to ensure that software wanted to be integrated into "version x" are passing said tests. (Let's call this proposition 2) It allows alternative implementation of the software, if someone is crazy enough to do that.
I agree it might look like "We are abandonning the branches, what will we do for people deploying regularily on a single branch?". Well it doesn't change much: Those are currently consuming the manifests of releases, bumping the SHAs from a "stable" branch of a repo. If a project/repo _decides to branch_ for a very critical reason, then it can still happen. If there is no reason to branch, then you would still bump the latest branch available. Still no issue there.
- You would no longer have clearly-comparable points between distributions. Everyone knows what "running Ussuri" means, which facilitates communication and bug report handling in the community.
I am not sure what "running Ussuri" means. Is that when the branch was cut, or the latest sha of all the projects' Ussuri branches? Having "OpenStack version Ussuri 1" corresponding to a manifest of project SHAs and or APIs versions is far more clear to me.
- You would no longer have clear community support commitments. We currently maintain and fix bug reports from people running vanilla "Ussuri"... but do we want to care about every combination of components under the sun? (maybe we do already)
We don't stop maintainers for contributing by being clearer in what we release and how we branch... If people want to maintain some old version and _need to branch a project_, I feel it's still possible. But it's now in the power of the project to decide whether it makes sense to do so, instead of being forced to manage something that might be stretching the teams thin.
- You would no longer have "OpenStack" released, so you miss the regular marketing opportunity to remind the rest of the world that it still exists. The OpenStack brand fades, and it gets more complicated to get development resources to work on it.
Again, it's wording. Please see my proposal above. I understand why it's a concern for the foundation however ;)
Without the synchronized release, OpenStack essentially becomes a rolling distribution of cloud components on which we make very limited guarantees. I guess it is suitable to build maintained distributions on, but it really is no longer directly usable beyond development. Is that what we want?
We would indeed be more "rolling", but it doesn't prevent tagging/point in time testing and quality assurance. Also, if we have integrated testing during the cycle and before the tagging, then it doesn't remove any guarantee, does it?
I disagree it's not usable beyond development. Here is my reasoning: 1) I don't see it as removing any guarantee if we test things out :) 2) Users of openstack are most likely using a deployment tooling (based on configuration management tool of choice) to deploy their clouds, not a manual deployment. This seems to be confirmed by the user survey. Do you mean that a change in the model of branching would irreversibly break all those tools, and make the ecosystem "no longer directly usable beyond development"?
Keep in mind I see those deploy toolings as part of openstack. Not only because they are part of the ecosystem, but because some are under OpenStack governance (OpenStack charms, openstack-chef, puppet openstack, osa, openstack-helm, kolla, tripleO).
Hence I consider that OpenStack would still be usable beyond development. I know at least a company that would continue to believe in OpenStack ;)
3) Many people seem eager to carry "stable branches" for "critical patches", but no new definition of such criticality was done.
Note that a common stable branch cut is *exactly* the same thing as a synchronized release... So I see 2 and 3 as being opposed views.
I agree with you there on common stable branch = synchronized release, especially if "criticality" = "Whenever we decide to cut a release". I still wanted to mention what was said, for the reader.
It doesn't mean that everyone agree on the definition of criticality yet, or maybe I am wrong? ;)
Next to this, I have questions: A) Am I the only one wanting to act on this?
Probably
B) Am I the only one caring?
Nope
C) What should be the next steps if I want to change this model? D) Should I propose a change in governance, sync with release management team? As this goes deep into the foundation's model, I feel like we need a proper coordination to make this happen.
Likely a patches to gov repos are needed (ref C & D)
E) Do you consider all the energy spent to change things does not bring enough positive reward for the whole ecosystem?
Absolutely, as someone who needs to think both upstream and downstream aspects of the release and has cared a lot about stable maintenance of OpenStack for years, what you're proposing here is literally pushing all our stable maintenance work to downstream and not even trying to share the efforts with the rest of the community. I also feel like this would turn really quickly to the point where if you're not following master on your deployment or using our product directly, I'd be happy to avice you to upgrade to the latest release and see if your problem still exists or go and talk to your deployment tool guys if they have seen the issue with whatever hashes they happen to deploy. I could have seen this as a tempting model 8-9 years ago when the competition was who is trailing master the closest and everyone was focusing on getting the next absolutely necessary feature set in to make OpenStack usable. Not now when we have matured considerably and are looking to provide stable production environments rather than rolling new code into production on a weekly basis. I'm not sure I even managed to grasp your proposal fully, but it really feels like half a step forward and mile, mile and half backwards.
I see people talking about changing releases for years now, I haven't seen a single change in our behaviour. (or maybe I missed something?). Is that stockholm syndrome? ;)
I see this as a matter that handful few want to bring up every few months
(for years now, we've had this discussion probably close to dozen times) and I have a feeling that the majority of the community is just tired of copypasting the same arguments every round to avoid breaking what works and genuinely is waiting for the thread being buried for the next few months so they can get back to work. - Erno "jokke" Kuvaja
JP
Hello, On Wed, Dec 8, 2021, at 13:44, Erno Kuvaja wrote:
Absolutely, as someone who needs to think both upstream and downstream aspects of the release and has cared a lot about stable maintenance of OpenStack for years, what you're proposing here is literally pushing all our stable maintenance work to downstream and not even trying to share the efforts with the rest of the community.
Not at all. I am fine with sharing the effort with the community. We just need to be "smarter" about the branching, and do the efforts right.
I also feel like this would turn really quickly to the point where if you're not following master on your deployment or using our product directly, I'd be happy to avice you to upgrade to the latest release and see if your problem still exists or go and talk to your deployment tool guys if they have seen the issue with whatever hashes they happen to deploy.
Let's be honest, when we are finding a bug, the first question we ask is "which version do you run", no? We don't ask for a branch, we ask for a SHA. Nothing changes there ;)
I could have seen this as a tempting model 8-9 years ago when the competition was who is trailing master the closest and everyone was focusing on getting the next absolutely necessary feature set in to make OpenStack usable. Not now when we have matured considerably and are looking to provide stable production environments rather than rolling new code into production on a weekly basis. I'm not sure I even managed to grasp your proposal fully, but it really feels like half a step forward and mile, mile and half backwards.
I don't think it's ever a race to follow the latest master branch. It's never been for me at least. For me it's about: - being smart about the definition of branching - forcing certain requirements on when branch, to be consistent in the community, WHILE bringing useful features for users - giving freedom to projects (of course while keeping consistency with the help of the TC). Being stable is fine. In fact, it's the dream. Stable doesn't mean stale, however ;)
I see people talking about changing releases for years now, I haven't seen a single change in our behaviour. (or maybe I missed something?). Is that stockholm syndrome? ;)
I see this as a matter that handful few want to bring up every few months (for years now, we've had this discussion probably close to dozen times) and I have a feeling that the majority of the community is just tired of copypasting the same arguments every round to avoid breaking what works and genuinely is waiting for the thread being buried for the next few months so they can get back to work.
I feel the same. But instead of burying I would like to act. Regards, JP
On Wed, 2021-12-08 at 19:04 +0100, Jean-Philippe Evrard wrote:
Hello,
On Wed, Dec 8, 2021, at 13:44, Erno Kuvaja wrote:
Absolutely, as someone who needs to think both upstream and downstream aspects of the release and has cared a lot about stable maintenance of OpenStack for years, what you're proposing here is literally pushing all our stable maintenance work to downstream and not even trying to share the efforts with the rest of the community.
Not at all. I am fine with sharing the effort with the community. We just need to be "smarter" about the branching, and do the efforts right. i dont nessisarly think our corrent branching is in anyway not smart.
I also feel like this would turn really quickly to the point where if you're not following master on your deployment or using our product directly, I'd be happy to avice you to upgrade to the latest release and see if your problem still exists or go and talk to your deployment tool guys if they have seen the issue with whatever hashes they happen to deploy.
Let's be honest, when we are finding a bug, the first question we ask is "which version do you run", no? We don't ask for a branch, we ask for a SHA. Nothing changes there ;)
I could have seen this as a tempting model 8-9 years ago when the competition was who is trailing master the closest and everyone was focusing on getting the next absolutely necessary feature set in to make OpenStack usable. Not now when we have matured considerably and are looking to provide stable production environments rather than rolling new code into production on a weekly basis. I'm not sure I even managed to grasp your proposal fully, but it really feels like half a step forward and mile, mile and half backwards.
I don't think it's ever a race to follow the latest master branch. It's never been for me at least.
For me it's about: - being smart about the definition of branching - forcing certain requirements on when branch, to be consistent in the community, WHILE bringing useful features for users - giving freedom to projects (of course while keeping consistency with the help of the TC). for me creating a branch ahs alwasy been to state tha twe have forzen the feature set for a given releas andy you can integrate it into your product or produciton with the expectation that it will only recive bug fixes goign forward.
unless we decide just to adopt a semver branching model i dont really see how what your propsoing will be benifical to consuming upstream release or prodcutising downstream to me this woudl be a regression and make my life harder. if we take a semver apprcoh and bump the major version only when backward incomaptable change are made and defience some suport boundary on when tha tis allowed then that might be an alternive branchign stragy that could work but nova at least still recived enough feature reuqest that we may still need to bump the major version every 6 months. with that said we did not make db schema change between train and wallaby so we have stablised some what compared to the early years where we had several in each release.
Being stable is fine. In fact, it's the dream. Stable doesn't mean stale, however ;)
I see people talking about changing releases for years now, I haven't seen a single change in our behaviour. (or maybe I missed something?). Is that stockholm syndrome? ;)
I see this as a matter that handful few want to bring up every few months (for years now, we've had this discussion probably close to dozen times) and I have a feeling that the majority of the community is just tired of copypasting the same arguments every round to avoid breaking what works and genuinely is waiting for the thread being buried for the next few months so they can get back to work.
I feel the same. But instead of burying I would like to act.
Regards, JP
---- On Wed, 01 Dec 2021 16:43:32 -0600 Jean-Philippe Evrard <openstack@a.spamming.party> wrote ----
Hello,
On Tue, Nov 30, 2021, at 23:31, Julia Kreger wrote:
It feels like this is a giant "no win situation", It feels like we have created a bunch of basically immovable, insurmountable conflicting obstacles. Kind of like a self digging holes. I'm worried not even hacker-Kirk can save us. Well, maybe his answer might actually be to abolish the integrated release so he can not only rescue the operators on the ship, but also beam them the tools they need to move forward. Granted, that is change, and human nature is a thing. :(
Well, I feel completely differently. For me, people are using different words, and are in agreement in some points. Or maybe I am reading this wrong?
Here is what I read: 1) Many want more releases, not less. I haven't seen a complaint about tagging more releases. 2) More than one person is proposing to abandon the integrated release, and nobody has complained about it. 3) Many people seem eager to carry "stable branches" for "critical patches", but no new definition of such criticality was done. 4) Many people want to make sure it's easy to upgrade, and with less steps for operations.
I don't see any conflicts, just areas for improvement, for those who have been participating on this topic.
Can someone clarify if I have tunnel vision/bias (as it seems exactly what I proposed in my first answer)?
TC is planning to discuss this topic on 3rd Feb 16:00 UTC, let us know if the time works fine. Accordingly, I will send the joining link. -gmann
Thank you in advance.
Regards, Jean-Philippe Evrard (evrardjp)
---- On Fri, 28 Jan 2022 19:47:16 -0600 Ghanshyam Mann <gmann@ghanshyammann.com> wrote ----
---- On Wed, 01 Dec 2021 16:43:32 -0600 Jean-Philippe Evrard <openstack@a.spamming.party> wrote ----
Hello,
On Tue, Nov 30, 2021, at 23:31, Julia Kreger wrote:
It feels like this is a giant "no win situation", It feels like we have created a bunch of basically immovable, insurmountable conflicting obstacles. Kind of like a self digging holes. I'm worried not even hacker-Kirk can save us. Well, maybe his answer might actually be to abolish the integrated release so he can not only rescue the operators on the ship, but also beam them the tools they need to move forward. Granted, that is change, and human nature is a thing. :(
Well, I feel completely differently. For me, people are using different words, and are in agreement in some points. Or maybe I am reading this wrong?
Here is what I read: 1) Many want more releases, not less. I haven't seen a complaint about tagging more releases. 2) More than one person is proposing to abandon the integrated release, and nobody has complained about it. 3) Many people seem eager to carry "stable branches" for "critical patches", but no new definition of such criticality was done. 4) Many people want to make sure it's easy to upgrade, and with less steps for operations.
I don't see any conflicts, just areas for improvement, for those who have been participating on this topic.
Can someone clarify if I have tunnel vision/bias (as it seems exactly what I proposed in my first answer)?
TC is planning to discuss this topic on 3rd Feb 16:00 UTC, let us know if the time works fine. Accordingly, I will send the joining link.
I might not be available this week, we will schedule this meeting next week or so. -gmann
-gmann
Thank you in advance.
Regards, Jean-Philippe Evrard (evrardjp)
---- On Wed, 02 Feb 2022 13:17:19 -0600 Jean-Philippe Evrard <openstack@a.spamming.party> wrote ----
On Tue, Feb 1, 2022, at 23:14, Ghanshyam Mann wrote:
I might not be available this week, we will schedule this meeting next week or so.
Ok, I hope I won't miss the next date then! :) Thanks for organising this, Ghanshyam.
We will meet tomorrow right after the TC meeting. Time: 10th Feb, 16:00 UTC Location: Voice/Video call @ https://meetpad.opendev.org/OpenStackReleasecadence -gmann
Regards, JP
---- On Wed, 09 Feb 2022 11:20:13 -0600 Ghanshyam Mann <gmann@ghanshyammann.com> wrote ----
---- On Wed, 02 Feb 2022 13:17:19 -0600 Jean-Philippe Evrard <openstack@a.spamming.party> wrote ----
On Tue, Feb 1, 2022, at 23:14, Ghanshyam Mann wrote:
I might not be available this week, we will schedule this meeting next week or so.
Ok, I hope I won't miss the next date then! :) Thanks for organising this, Ghanshyam.
We will meet tomorrow right after the TC meeting.
Time: 10th Feb, 16:00 UTC Location: Voice/Video call @ https://meetpad.opendev.org/OpenStackReleasecadence
We are about to start the discussion, please join if you are interested. -gmann
-gmann
Regards, JP
I am sorry I didn’t see this in time. ‘Tomorrow’ is a short notice ;) Can you link to the meetings notes, please ? Regards, JP
On 2022-02-14 08:42:03 +0100 (+0100), JP E wrote:
I am sorry I didn’t see this in time. ‘Tomorrow’ is a short notice ;)
Can you link to the meetings notes, please ?
https://etherpad.opendev.org/p/openstackreleasecadence -- Jeremy Stanley
Thanks Jeremy, that's perfect!
From the notes, I am not sure what "Extreme positions were expressed ("no longer do releases")" means (L28). Might be worth clarifying...
Overall, I am happy with Dan's proposal, it's a positive point to ensure in CI that we can upgrade every two releases of 6 months. I am not sure whether this will be help us in the long run to keep a forced release every x months (rather than team autonomy + 'tag when needed' + refstack for 'coordinated release' ... especially when we'll have less contributions). However, that's not a hill I will die on, so let's move on :) Thanks for everyone involved to make this better. Regards, JP
---- On Mon, 14 Feb 2022 13:58:14 -0600 JP E <openstack@a.spamming.party> wrote ----
Thanks Jeremy, that's perfect!
From the notes, I am not sure what "Extreme positions were expressed ("no longer do releases")" means (L28). Might be worth clarifying...
Overall, I am happy with Dan's proposal, it's a positive point to ensure in CI that we can upgrade every two releases of 6 months.
I am not sure whether this will be help us in the long run to keep a forced release every x months (rather than team autonomy + 'tag when needed' + refstack for 'coordinated release' ... especially when we'll have less contributions). However, that's not a hill I will die on, so let's move on :)
Thanks for everyone involved to make this better.
Thanks JP and sorry for short meeting notice. For long term, Dan proposal give us more flexibility and things to check for future which we can keep improving as per the situation in future. -gmann
Regards, JP
On Mon, 2021-11-29 at 13:09 +0000, Jeremy Stanley wrote:
On 2021-11-29 13:21:52 +0100 (+0100), Jean-Philippe Evrard wrote: [...]
My experience at SUSE was that the branching model is even debatable: It was more work, and after all, we were taking the code we wanted, and put our patches on top if those didn't make upstream/weren't backported on time for x reasons (valid or not ;)). So basically, for me, the stable branches have very little value nowdays from the community perspective (it would be good enough if everybody is fixing master, IMO). [...]
The primary reason stable branches exist is to make it easier for us to test and publish backports of critical patches to older versions of the software, rather than expecting our downstream consumers to do that work themselves. If you're saying distribution package maintainers are going to do it anyway and ignore our published backports, then dropping the branching model may make sense, but I've seen evidence to suggest that at least some distros do consume our backports directly.
just speaking form personal experince backporting patches upstream and downstream for redhat osp. i have much much higher confidence in backporting patches to downswtream by first backproting them upstream via the stable branches due to the significantly better upstream ci before the patch is merged. most of our downstream ci happens after the code is merged as part of a unifed build/compose that is then tested by our QE often sever weeks after its merged before the release of our next downstream .z release. We have some patch ci but its really minimal in comparison to the test coverage we have upstream on stable branches and its also more work to do a downstream only backport or premtive downstream backport anyway. since we skip releases downstream if i want to do a downstream only backport e.g. because its a feature i have to backport acrross 3+ release to train in one go which is way harder then resolving conflicts per release. if im doing both an upstream backport and a premtive downstream backport to not have to wait for upstream to merge its also kind of a pain as if i need to make revsions upstream we will get a merge confilct the next time our downstream barnch is rebalsed. so basically if i can i will always do an upstream only backport and wait for the change to by syned downstream via an import. To me the stable brances privide great value, even more value if we allowed feature backports as it woudl elimiate the need for us to carry those downstream. if we could backport features it we could almost avoid downstream branched entirely for everything other then perhaps CVE, fixes or other very rare cases. even in there current state however i stongly think our stable branchs add value and its a compelling aspect of our community. not all opensouce project maintain upstream stable branches as we do and in such cases you are often forced to choose between runing the package from your disto or the project directly to get the fixes and features you need. while we do eventually stop importing from upstream into our downstream packages OSP13z15 which released in march this year was a fulll import form stable/queens the osp 16.2.2 next year will also be a fully import form stable/train once our cherry-pick only release 16.2.1 is out the door to customers. while we do carry patches downstream which must be rebased on top of upstream every time we import, upstream stable adds a lot of value and we mitigate the overhead of downstream patches by applying a very strict feature backport policy which basically ammount to no api,db,rpc or versioned object changes. you would be suprised how many feature still can be backported with those restrictions but it avoid the upgade and interoperablity impact of most feature backports. tl;dr let please keep the upstream stable branches for as long as pepole are willing to maintain them.
Hi everyone,
The (long) document below reflects the current position of the release management team on a popular question: should the OpenStack release cadence be changed? Please note that we only address the release management / stable branch management facet of the problem. There are other dimensions to take into account (governance, feature deprecation, supported distros...) to get a complete view of the debate.
Introduction ------------
The subject of how often OpenStack should be released has been regularly debated in the OpenStack community. OpenStack started with a 3-month release cycle, then switched to 6-month release cycle starting with Diablo. It is often thought of a release management decision, but it is actually a much larger topic: a release cadence is a trade-off between pressure to release more often and pressure to release less often, coming in from a lot of different stakeholders. In OpenStack, it is ultimately a Technical Committee decision. But that decision is informed by the position of a number of stakeholders. This document gives historical context and describes the current release management team position.
The current trade-off ---------------------
The main pressure to release more often is to make features available to users faster. Developers get a faster feedback loop, hardware vendors ensure software is compatible with their latest products, and users get exciting new features. "Release early, release often" is a best practice in our industry -- we should generally aim at releasing as often as possible.
But that is counterbalanced by pressure to release less often. From a development perspective, each release cycle comes with some process overhead. On the integrators side, a new release means packaging and validation work. On the users side, it means pressure to upgrade. To justify that cost, there needs to be enough user-visible benefit (like new features) in a given release.
For the last 10 years for OpenStack, that balance has been around six months. Six months let us accumulate enough new development that it was worth upgrading to / integrating the new version, while giving enough time to actually do the work. It also aligned well with Foundation events cadence, allowing to synchronize in-person developer meetings date with start of cycles.
For sure I'm not talking on behalf of every project (or I might be, but I just don't know the dynamics across well enough). Anyways I think this assessment is missing one critical point, which is a release being the break off point for bikeshedding. I see a lot of genuine urgency to finally make a decision that has been going back and forth from the early cycle and
On Fri, Nov 5, 2021 at 2:39 PM Thierry Carrez <thierry@openstack.org> wrote: the difference of opinion being finally solved when we're hitting the feature freeze / RC time. This does not only apply to feature work but lots of bugs as well that do not have active community members pushing them to get fixed (and backported) early. That push before RC is tagged is and seems to have been steadily intense few weeks before release, it's taxing for sure but it also ensures that we do get things done or make a real active decision to push it down the line for at least another half a year. I think the biggest actual drawback of longer release cycle is to lose this checkpoint (lets be honest here, no-one cares about milestones). Not that longer release would only make those hard decisions of "Are we including the work in this release or not" be even rarer occasion but the push before RC would intensify a lot when we have double the time of accumulating the review dept before we actually have to have that discussion. I think more of valuable contributions would be lost by losing traction (and people just deciding to carry the patches as one more downstream only thing as the community can't get around it). What changed
------------
The major recent change affecting this trade-off is that the pace of new development in OpenStack slowed down. The rhythm of changes was divided by 3 between 2015 and 2021, reflecting that OpenStack is now a mature and stable solution, where accessing the latest features is no longer a major driver. That reduces some of the pressure for releasing more often. At the same time, we have more users every day, with larger and larger deployments, and keeping those clusters constantly up to date is an operational challenge. That increases the pressure to release less often. In essence, OpenStack is becoming much more like a LTS distribution than a web browser -- something users like moving slow.
Over the past years, project teams also increasingly decoupled individual components from the "coordinated release". More and more components opted for an independent or intermediary-released model, where they can put out releases in the middle of a cycle, making new features available to their users. This increasingly opens up the possibility of a longer "coordinated release" which would still allow development teams to follow "release early, release often" best practices. All that recent evolution means it is (again) time to reconsider if the 6-month cadence is what serves our community best, and in particular if a longer release cadence would not suit us better.
The release management team position on the debate --------------------------------------------------
While releasing less often would definitely reduce the load on the release management team, most of the team work being automated, we do not think it should be a major factor in motivating the decision. We should not adjust the cadence too often though, as there is a one-time cost in switching our processes. In terms of impact, we expect that a switch to a longer cycle will encourage more project teams to adopt a "with-intermediary" release model (rather than the traditional "with-rc" single release per cycle), which may lead to abandoning the latter, hence simplifying our processes. Longer cycles might also discourage people to commit to PTL or release liaison work. We'd probably need to manage expectations there, and encourage more frequent switches (or create alternate models).
While most of the PTL positions have not been resolved in elections for a while, this is a great note to keep in mind. Fundamentally we should keep that process as the main means to select new PTLs and pushing more and more towards handovers without even room for debate might bite us one day.
If the decision is made to switch to a longer cycle, the release management team recommends to switch to one year directly. That would avoid changing it again anytime soon, and synchronizing on a calendar year is much simpler to follow and communicate. We also recommend announcing the change well in advance. We currently have an opportunity of making the switch when we reach the end of the release naming alphabet, which would also greatly simplify the communications around the change.
Finally, it is worth mentioning the impact on the stable branch work. Releasing less often would likely impact the number of stable branches that we keep on maintaining, so that we do not go too much in the past (and hit unmaintained distributions or long-gone dependencies). We currently maintain releases for 18 months before they switch to extended maintenance, which results in between 3 and 4 releases being maintained at the same time. We'd recommend switching to maintaining one-year releases for 24 months, which would result in between 2 and 3 releases being maintained at the same time. Such a change would lead to longer maintenance for our users while reducing backporting work for our developers.
-- Thierry Carrez (ttx) On behalf of the OpenStack Release Management team
In general my 2 cents for the proposals and couple of ideas maybe to
consider easing the pain of current model: I really think the slowing phase (not due to lack of work items) is real and the biggest concern I have for a longer cycle is that it would be driving us to lose even more momentum and valuable contributions (see my comment at the end of "current trade-off"). It would also increase a lot of the pressure to backport features (no matter how much we claim this not being the case, we've seen it very clearly first hand downstream when we moved our product cycle to cover multiple upstream development cycles). Like Dan mentioned early on this thread the downstream releases from different sources won't align anyways and I think would contribute even more towards the drive of doing work downstream rather than upstream. In general I hate the idea of LTS even more than a longer cycle as then you effectively need to maintain 2 forks and put a lot of pressure on future contributors to align them just so that you can have yet another forking point to start diverting again. Perhaps a couple of things we could consider for easing the pain of upgrades over multiple cycles, the load of release itself and still balance the momentum of the "break point": 1) Compress the last few weeks of release with maybe aligning feature freeze with RC and bringing the final library releases closer as well. This might require us to have RC of clients to ease the integration pressure. This could result to few more RCs being tagged as things are being discovered later but them tags are fairly cheap. 2) Have any breaking changes (removing of deprecated anything) and disruptive db migrations happening only on cycle released in the second half of the year, while the first half would focus on bug fixes, non-intrusive feature work, etc. 3) move to max single community goal per year that targets to be finished on that second release (as they seem to be more and more disruptive in nature). As a bonus point I'd like to see a call to action to cut the test loads we're generating. I think our gating across the projects has some serious redundancy on them (I still remember the panic when our average check and gate runs reached 1hr mark). It's been great to see the efforts of covering more with our testing and that is very important too, but I still think we're eating a lot of infra resources that could be freed up (specially easing the last weeks of any release point) without losing the quality of our testing. This would give us the opportunity to have a real coordinated release as a checkpoint to get things done, but allow distributions and consumers to worry about the major upgrade pain of only one release per year. It would give us still 2 times a year to hype about the new release and all the positives coming with it and keep the planning of work, resources & commitments in more manageable chunks. Most importantly, apart from not allowing the breaking changes in the first release of the year we should treat both of them as 1st class citizens of releases, not "Development and LTS" or anything like that, just concentration of pain to the later one. - Erno "jokke" Kuvaja
participants (15)
-
Arnaud
-
Belmiro Moreira
-
Dan Smith
-
Dmitry Tantsur
-
Erno Kuvaja
-
Ghanshyam Mann
-
Jean-Philippe Evrard
-
Jeremy Stanley
-
JP E
-
Julia Kreger
-
Mohammed Naser
-
Sean Mooney
-
Slawek Kaplonski
-
Thierry Carrez
-
Thomas Goirand