[all][tc] Relmgt team position on release cadence

Belmiro Moreira moreira.belmiro.email.lists at gmail.com
Tue Nov 9 13:50:31 UTC 2021


Hi,
It's time again to discuss the release cycle...
Just considering the number of times that lately we have been
discussing the release cycle we should acknowledge that we really
have a problem or at least that we have very different opinions in
the community and we should discuss it openly.

Thanks Thierry to bring the topic again.

Looking into the last user survey we see that 23% of the deployments
are running the last two releases and then we have a long... long...
tail with older releases.

Honestly, I have mixed feelings about it!

As an operator I relate more with having a LTS release and give the
possibility to upgrade between LTS releases. But having the possibility
to upgrade every 6 months is also very interesting for the small and fast
moving projects.

Maybe an 1 year release cycle would provide the mid term here.

In our cloud infrastructure we run different releases, from Stein to
Victoria.
There are projects that we can easily upgrade (and we do it!) and other
projects
that are much more complicated (because feature deprecations, Operating
System
dependencies, internal patches, or simply because is too risky considering
the
current workloads).
For those we need definitely more than 6 months for the upgrade.

If again we don't reach a consensus to change the release cycle at least we
should continue to work in improving the upgrade experience (and don't let
me wrong...
the upgrade experience has been improved tremendously over the years).

There are small things that change in the projects (most of them are good
refactors)
but can be a big headache for upgrades.
Let me enumerate some: DB schema changes usually translates into offline
upgrades,
configuration changes (options that move to different configuration groups
without
bringing anything new, change defaults, policy changes), architecture
changes
(new projects that are now mandatory), ...

In my opinion if we reduce those or at least are more aware of the
challenges
that they impose to operators, we will make upgrades easier and hopefully
see
deployments move much faster whatever is the release cycle.

cheers,
Belmiro

On Tue, Nov 9, 2021 at 12:04 AM Arnaud <arnaud.morin at gmail.com> wrote:

> Hey,
> I'd like to add my 2 cents.
>
> It's hard to upgrade a region, so when it comes to upgrade multiples
> regions, it's even harder.
>
> Some operators also have their own downstream patchs / extensions /
> drivers which make the upgrade process more complex, so it take more time
> (for all reasons already given in the thread, need to update the CI, the
> tools, the doc, the people, etc).
>
> One more thing is about consistency, when you have to manage multiple
> regions, it's easier if all of them are pretty identical. Human operation
> are always the same, and can eventually be automated.
> This leads to keep going on with a fixed version of OpenStack to run the
> business.
> When scaling, you (we) always chose security and consistency.
>
> Also, Julia mentioned something true about contribution from operators.
> It's difficult for them for multiple reasons:
> - pushing upstream is a process, which need to be taken into account when
> working on an internal fix.
> - it's usually quicker to push downstream because it's needed. When it
> comes to upstream, it's challenged by the developers (and it's good), so it
> take time and can be discouraging.
> - operators are not running master, but a stable release. Bugs on stables
> could be fixed differently than on master, which could also be discouraging.
> - writing unit tests is a job, some tech operators are not necessarily
> developers, so this could also be a challenge.
>
> All of these to say that helping people which are proposing a patch is a
> good thing. And as far as I can see, upstream developers are helping most
> of the time, and we should keep and encourage such behavior IMHO.
>
> Finally, I would also vote for less releases or LTS releases (but it looks
> heavier to have this). I think this would help keeping up to date with
> stables and propose more patches from operators.
>
> Cheers,
> Arnaud.
>
>
> Le 8 novembre 2021 20:43:18 GMT+01:00, Julia Kreger <
> juliaashleykreger at gmail.com> a écrit :
>>
>> On Mon, Nov 8, 2021 at 10:44 AM Thierry Carrez <thierry at openstack.org> wrote:
>>
>>>
>>>  Ghanshyam Mann wrote:
>>>
>>>>  [...]
>>>>  Thanks Thierry for the detailed write up.
>>>>
>>>>  At the same time, a shorter release which leads to upgrade-often pressure but
>>>>  it will have fewer number of changes/features, so make the upgrade easy and
>>>>  longer-release model will have more changes/features that will make upgrade more
>>>>  complex.
>>>>
>>>
>>>  I think that was true a few years ago, but I'm not convinced that still
>>>  holds. We currently have a third of the changes volume we had back in
>>>  2015, so a one-year release in 2022 would contain far less changes than
>>>  a 6-month release from 2015.
>>>
>>
>> I concur. Also, in 2015, we were still very much in a "move fast" mode
>> of operation as a community.
>>
>>  Also, thanks to our testing and our focus on stability, the pain linked
>>>  to the amount of breaking changes in a release is now negligible
>>>  compared to the basic pain of going through a 1M-core deployment and
>>>  upgrading the various pieces... every 6 months. I've heard of multiple
>>>  users claiming it takes them close to 6 months to upgrade their massive
>>>  deployments to a new version. So when they are done, they have to start
>>>  again.
>>>
>>>  --
>>>  Thierry Carrez (ttx)
>>>
>>>
>> I've been hearing the exact same messaging from larger operators as
>> well as operators in environments where they are concerned about
>> managing risk for at least the past two years. These operators have
>> indicated it is not uncommon for the upgrade projects which consume,
>> test, certify for production, and deploy to production take *at least*
>> six months to execute. At the same time, they are shy of being the
>> ones to also "find all of the bugs", and so the project doesn't
>> actually start until well after the new coordinated release has
>> occurred. Quickly they become yet another version behind with this
>> pattern.
>>
>> I suspect it is really easy for us as a CI focused community to think
>> that six months is plenty of time to roll out a fully updated
>> deployment which has been fully tested in every possible way. Except,
>> these operators are often trying to do just that on physical hardware,
>> with updated firmware and operatings systems bringing in new variables
>> with every single change which may ripple up the entire stack. These
>> operators then have to apply the lessons they have previously learned
>> once they have worked through all of the variables. In some cases this
>> may involve aspects such as benchmarking, to ensure they don't need to
>> make additional changes which need to be factored into their
>> deployment, sending them back to the start of their testing. All while
>> thinking of phrases like "business/mission critical".
>>
>> I guess this means I'm in support of revising the release cycle. At
>> the same time, I think it would be wise for us to see if we can learn
>> from these operators the pain points they experience, the process they
>> leverage, and ultimately see if there are opportunities to spread
>> knowledge or potentially tooling. Or maybe even get them to contribute
>> their patches upstream. Not that all of these issues are easily solved
>> with any level of code, but sometimes they can include contextual
>> disconnects and resolving those are just as important as shipping a
>> release, IMHO.
>>
>> -Julia
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20211109/f68d3f45/attachment.htm>


More information about the openstack-discuss mailing list