Mark Goddard wrote:
> [...]
> As kolla PTL and ironic release liaison I've proposed a number of
> release patches recently. Generally the release team is good at churning
> through these, but sometimes patches can hang around for a while.
> Usually a ping on IRC will get things moving again within a day or so
> (thanks in particular to Sean who has been very responsive).
I agree we've seen an increase in processing delay lately, and I'd like
to correct that. There are generally three things that would cause a
perceptible delay in release processing...
1- wait for two release managers +2
This is something we put in place some time ago, as we had a lot of new
members and thought that would be a good way to onboard them. Lately it
created delays as a lot of those were not as active.
2- stable releases
Two subcases in there... Eitherthe deliverable is under stable policy
and there are *significant* delays there as we have to pause to give a
chance to stable-maint-core people to voice an opinion. Or the
deliverable is not under stable policy, but we do a manual check on the
changes, as a way to educate the requester on semver.
3- waiting for PTL/release liaison to approve
That can take a long time, but the release management team is not really
at fault there.
Could you describe where you've seen "sometimes patches can hang around
for a while"? I suspect they belong in the (2) category?
I hadn't realised there was a requirement for the stable team to review stable patches. That could explain some of my experience. It could also be due to kolla being cycle-trailing, we often make releases at unusual times.
> [...]
> I have a few questions for the release team about these reviews.
>
> * What manual checks do you do beyond those that are currently automated?
See https://releases.openstack.org/reference/reviewer_guide.html
> * Could the above checks be automated?
We aggressively automate everything that can be. Like I'm currently
working to automate the check that the release was approved by the PTL
or release liaison.
> * What issues have you caught that were not caught by CI jobs?
It's generally semver violations, or timing issues (like requesting a
release during a freeze). Sometimes it's corner cases not handled (yet)
by automation, like incompatibility between the release version asked
and the deliverable release model. You can look at the history of
releases for examples.
> Hopefully I haven't offended anyone here. There's often more involved
> with these things than you first suspect.
Decentralizing would be a lot of work to create new systems and
processes... and I don't think we can automate everything. It's
unreasonable to expect everyone to know the release process by heart and
respect timing and freezes. And releases are the only thing we produce
that we can't undo.
I would rather eliminate the issue by making sure release processing is
back to fast. So here is my proposal:
- go back to single release manager approval
This seems like it should make a big difference - reducing the load on reviewers and the requirements for approval should reduce time in flight.
- directly approve stable releases after a cursory semver check, not
waiting for stable-maint-core approval.
That should make sure all releases are processed within a couple of
days, which I think is a good trade-off between retaining some releases
for 10+ days and not having a chance to catch odd cases before releases
at all.
Thoughts?
Thanks for the detailed response. I tend to prefer models where teams can be self-sufficient using shared tooling and policies, but I'm also missing some context and history, and don't have to clean up when things go wrong. Ultimately, you've proposed some simple changes which should improve the situation, so that's a good result in my view.
--
Thierry Carrez (ttx)