[openstack-dev] [infra] redo gate jobs only

Eric K ekcs.openstack at gmail.com
Thu Aug 3 17:57:07 UTC 2017

Thanks a lot Jeremy and Andreas for the reference and the added context!

On 8/3/17, 10:49 AM, "Jeremy Stanley" <fungi at yuggoth.org> wrote:

>On 2017-08-03 08:15:36 +0200 (+0200), Andreas Jaeger wrote:
>> "A patchset has to be approved to run tests in the gate pipeline. If the
>> patchset has failed in the gate pipeline (it will have been approved to
>> get into the gate pipeline) a recheck will first run the check jobs and
>> if those pass, it will again run the gate jobs. There is no way to only
>> run the gate jobs, the check jobs will first be run again."
>The reasons being:
>1. There's no good way to decide how long is too long to wait
>between passing jobs in check and running jobs in the gate. We used
>to not enforce this "clean check" policy and developers would
>repeatedly reverify broken changes back into the gate pipeline over
>and over creating a significant amount of additional disruption
>because their change had passed check jobs once (perhaps many months
>earlier). Now they at least only get to disrupt the gate once when
>that change gets approved, but after that it won't be able to make
>it back into the gate until a fixed revision is uploaded and so
>doesn't further slow down the merging of unrelated changes.
>2. If a change passes jobs once (in check) and then fails later (in
>the gate) then there's a fair chance it's introducing a
>nondeterministic bug (one which only manifests sometimes but not on
>every run). Back when we used to allow reverification directly in
>the gate pipeline for changes which passed check, we had people
>rechecking flaky changes until they passed and then reverifying them
>over and over after approval until they made them through the gate.
>Under these conditions a recheck followed by a reverify could merge
>changes which failed jobs 50% of the time; 9 rechecks and 9
>reverifies could merge a change which failed jobs 90% of the time on
>average. With the current requirements to pass both check and gate
>in series, it takes on average 3 rechecks to merge a 50% failing
>change and 99 rechecks to merge a 90% failing change.
>So basically if a change fails in the gate pipeline, there's good
>reason for it to get increased scrutiny at least in the form of
>trying the jobs again in the check pipeline before going back to the
>gate once more.
>Jeremy Stanley
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe

More information about the OpenStack-dev mailing list