[ironic] [qa] ironic-tempest-plugin CI bloat

Dmitry Tantsur dtantsur at redhat.com
Wed Jan 2 18:39:00 UTC 2019


On 1/2/19 7:24 PM, Clark Boylan wrote:
> On Wed, Jan 2, 2019, at 3:18 AM, Dmitry Tantsur wrote:
>> Hi all and happy new year :)
>>
>> As you know, tempest plugins are branchless, so the CI of ironic-
>> tempest-plugin
>> has to run tests on all supported branches. Currently it amounts to 16
>> (!)
>> voting devstack jobs. With each of them have some small probability of a
>> random
>> failure, it is impossible to land anything without at least one recheck,
>> usually
>> more.
>>
>> The bad news is, we only run master API tests job, and these tests are
>> changed
>> more often that the other. We already had a minor stable branch breakage
>> because
>> of it [1]. We need to run 3 more jobs: for Pike, Queens and Rocky. And
>> I've just
>> spotted a missing master multinode job, which is defined but does not
>> run for
>> some reason :(
>>
>> Here is my proposal to deal with gate bloat on ironic-tempest-plugin:
>>
>> 1. Do not run CI jobs at all for unsupported branches and branches in extended
>> maintenance. For Ocata this has already been done in [2].
>>
>> 2. Make jobs running with N-3 (currently Pike) and older non-voting (and
>> thus
>> remove them from the gate queue). I have a gut feeling that a change
>> that breaks
>> N-3 is very likely to break N-2 (currently Queens) as well, so it's
>> enough to
>> have N-2 voting.
>>
>> 3. Make the discovery and the multinode jobs from all stable branches
>> non-voting. These jobs cover the tests that get changed very infrequently (if
>> ever). These are also the jobs with the highest random failure rate.
> 
> Has any work been done to investigate why these jobs fail? And if not maybe we should stop running the jobs entirely. Non voting jobs that aren't reliable will just get ignored.

 From my experience it's PXE failing or just generic timeout on slow nodes. Note 
that they still don't fail too often, it's their total number that makes it 
problematic. When you have 20 jobs each failing with, say, 5% rate it's just 35% 
chance of passing (unless I cannot do math).

But to answer your question, yes, we do put work in that. We just never got to 
0% of random failures.

> 
>>
>> 4. Add the API tests, voting for Queens to master, non-voting for Pike (as
>> proposed above).
>>
>> This should leave us with 20 jobs, but with only 11 of them voting. Which is
>> still a lot, but probably manageable.
>>
>> The corresponding change is [3], please comment here or there.
>>
>> Dmitry
>>
>> [1] https://review.openstack.org/622177
>> [2] https://review.openstack.org/621537
>> [3] https://review.openstack.org/627955
>>
> 




More information about the openstack-discuss mailing list