On Fri, 20 Dec 2019, 10:06 Thierry Carrez, <thierry@openstack.org> wrote:
Mark Goddard wrote:
[...] As kolla PTL and ironic release liaison I've proposed a number of release patches recently. Generally the release team is good at churning through these, but sometimes patches can hang around for a while. Usually a ping on IRC will get things moving again within a day or so (thanks in particular to Sean who has been very responsive).
I agree we've seen an increase in processing delay lately, and I'd like to correct that. There are generally three things that would cause a perceptible delay in release processing...
1- wait for two release managers +2
This is something we put in place some time ago, as we had a lot of new members and thought that would be a good way to onboard them. Lately it created delays as a lot of those were not as active.
2- stable releases
Two subcases in there... Eitherthe deliverable is under stable policy and there are *significant* delays there as we have to pause to give a chance to stable-maint-core people to voice an opinion. Or the deliverable is not under stable policy, but we do a manual check on the changes, as a way to educate the requester on semver.
3- waiting for PTL/release liaison to approve
That can take a long time, but the release management team is not really at fault there.
Could you describe where you've seen "sometimes patches can hang around for a while"? I suspect they belong in the (2) category?
I hadn't realised there was a requirement for the stable team to review stable patches. That could explain some of my experience. It could also be due to kolla being cycle-trailing, we often make releases at unusual times.
[...] I have a few questions for the release team about these reviews.
* What manual checks do you do beyond those that are currently automated?
See https://releases.openstack.org/reference/reviewer_guide.html
* Could the above checks be automated?
We aggressively automate everything that can be. Like I'm currently working to automate the check that the release was approved by the PTL or release liaison.
* What issues have you caught that were not caught by CI jobs?
It's generally semver violations, or timing issues (like requesting a release during a freeze). Sometimes it's corner cases not handled (yet) by automation, like incompatibility between the release version asked and the deliverable release model. You can look at the history of releases for examples.
Hopefully I haven't offended anyone here. There's often more involved with these things than you first suspect.
Decentralizing would be a lot of work to create new systems and processes... and I don't think we can automate everything. It's unreasonable to expect everyone to know the release process by heart and respect timing and freezes. And releases are the only thing we produce that we can't undo.
I would rather eliminate the issue by making sure release processing is back to fast. So here is my proposal:
- go back to single release manager approval
This seems like it should make a big difference - reducing the load on reviewers and the requirements for approval should reduce time in flight.
- directly approve stable releases after a cursory semver check, not waiting for stable-maint-core approval.
That should make sure all releases are processed within a couple of days, which I think is a good trade-off between retaining some releases for 10+ days and not having a chance to catch odd cases before releases at all.
Thoughts?
Thanks for the detailed response. I tend to prefer models where teams can be self-sufficient using shared tooling and policies, but I'm also missing some context and history, and don't have to clean up when things go wrong. Ultimately, you've proposed some simple changes which should improve the situation, so that's a good result in my view.
-- Thierry Carrez (ttx)