Re: [all] Gate resources and performance

8 Feb 2021


      On Sat, 2021-02-06 at 20:51 +0100, Slawek Kaplonski wrote:
...
Hi,
Dnia sobota, 6 lutego 2021 10:33:17 CET Dmitry Tantsur pisze:
...
On Sat, Feb 6, 2021 at 12:10 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
...
On 2021-02-05 22:52:15 +0100 (+0100), Dmitry Tantsur wrote:
[...]
...
7.1. Stop marking dependent patches with Verified-2 if their
parent fails in the gate, keep them at Verified+1 (their previous
state). This is a common source of unnecessary rechecks in the
ironic land.
[...]
Zuul generally assumes that if a change fails tests, it's going to
need to be revised.
Very unfortunately, it's far from being the case in the ironic world.
...
Gerrit will absolutely refuse to allow a change
to merge if its parent has been revised and the child has not been
rebased onto that new revision. Revising or rebasing a change clears
the Verified label and will require new test results.
This is fair, I'm only referring to the case where the parent has to be
rechecked because of a transient problem.
...
Which one or
more of these conditions should be considered faulty? I'm guessing
you're going to say it's the first one, that we shouldn't assume
just because a change fails tests that means it needs to be fixed.
Unfortunately, yes.
A parallel proposal, that has been rejected numerous times, is to allow
recheching only the failed jobs.
Even if I totally understand cons of that I would also be for such 
possibility. Maybe e.g. if only cores would have such possibility somehow 
would be good trade off?
it would require zuul to fundemtally be altered.
currently triggers are defined at teh pipeline level we would have to instead define them per job.
and im not sure restcting it to core would really help. it might but unless we force the same commit
hashes to be reused so that all jobs used the same exact version fo the code i dont think it safe.
...
...
Dmitry
...
This takes us back to the other subthread, wherein we entertain the
notion that if changes have failing jobs and the changes themselves
aren't at fault, then we should accept this as commonplace and lower
our expectations.
Keep in mind that the primary source of pain here is one OpenStack
has chosen. That is, the "clean check" requirement that a change get
a +1 test result in the check pipeline before it can enter the gate
pipeline. This is an arbitrary pipeline criterion, chosen to keep
problematic changes from getting approved and making their way
through the gate queue like a wrecking-ball, causing repeated test
resets for the changes after them until they reach the front and
Zuul is finally able to determine they're not just conflicting with
other changes ahead. If a major pain for Ironic and other OpenStack
projects is the need to revisit the check pipeline after a gate
failure, that can be alleviated by dropping the clean check
requirement.
Without clean check, a change which got a -2 in the gate could
simply be enqueued directly back to the gate again. This is how it
works in our other Zuul tenants. But the reason OpenStack started
enforcing it is that reviewers couldn't be bothered to confirm
changes really were reasonable, had *recent* passing check results,
and confirmed that observed job failures were truly unrelated to the
changes themselves.
--
Jeremy Stanley
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael
O'Neill

Re: [all] Gate resources and performance

Sean Mooney