---- On Fri, 17 May 2024 09:03:15 -0700 Jeremy Stanley wrote ---
On 2024-05-17 17:27:12 +0200 (+0200), Dmitriy Rabotyagov wrote:
Just for projects to be aware - since these are now removed from Zuul project list, all projects who still had them in their required-projects for Zuul jobs ended up with broken Zuul config.
What's worse, it was failing only *some* jobs that were requiring these repos. As a result, for OpenStack-Ansible around 20 patches landed without gating, as docs job was the only passing one, it was sufficient to set Verified+2. [...]
Yes, that's expected behavior at least from Zuul's perspective. When those projects were removed from the OpenStack tenant, any job definitions which listed them as required-projects (directly or through inheritance) became invalid and were removed from its in-memory configuration. Zuul does test job configuration changes and will refuse to gate any which break job definitions, but there is no similar testing to prevent removal of projects which would result in broken job definitions.
Extending Zuul to optionally keep those jobs in the pipeline but force them to insta-fail instead of removing them might be a reasonable feature request, I haven't thought through the possible implications (it could also just be an easy way to create deadlocks though). Similarly we could maybe work out some way to scan Zuul's loaded configuration for any references to projects being removed and then fail testing on the tenant configuration change that proposes to remove them, though that might lead to those changes just never getting merged in some cases.
IMO, this should be fixed. ignoring the jobs for any reason which are supposed to run is a false positive. More failure in many cases where it can ignore the things is still better. Either Zuul can run and fail the job or try to ignore the non-existing things (required-projects) and see if the job can pass or not.
For now, the best approach is to make it clear when projects are proposed for retirement that any other projects who refer to them in their job configs should remove all references at the earliest opportunity, ideally before actual retirement changes merge. Also, keeping a close eye on Zuul's config errors list immediately after such retirement changes merge and reaching out to the affected teams could help further reduce the window for related risks.
yeah, I swapped these steps to avoid this but still possible that cleanup changes might merge late than infra removal. https://review.opendev.org/c/openstack/project-team-guide/+/919976 -gmann
-- Jeremy Stanley