[openstack-dev] Current biggest OpenStack gate fail culprit - neutron bug #1194026

Sean Dague sean at dague.net
Thu Jul 11 16:11:25 UTC 2013


On 07/11/2013 11:54 AM, Sean Dague wrote:
> On 07/11/2013 11:33 AM, Matthew Treinish wrote:
>> On Thu, Jul 11, 2013 at 08:16:26AM -0700, Dan Smith wrote:
>>>> In the corner to my left, our current largest gate reset culprit
>>>> appears to be neutron bug #1194026 - weighing in with 62 rechecks
>>>> since June 24th (http://status.openstack.org/rechecks/)
>>>
>>> So, with some of the highest rates of patch traffic we've seen over the
>>> last couple of weeks before the H2 deadline, I think this is really
>>> becoming a problem. I think merge times are through the roof as a
>>> result.
>>>
>>> Since the neutron gate is not a full tempest run, I think we should
>>> consider making a temporary change. I know that turning it into a
>>> non-voting job is not a popular solution, and I hate to even suggest
>>> it. However, it's just a subset of the tests anyway and I think the
>>> impact is currently overshadowing the potential for regression
>>> detection, given the relatively small amount of coverage. Is this
>>> something people would consider?
>>
>> I don't think this is the way to go. Even though it's limited coverage
>> without it Neutron would have no gating integrated testing run on it
>> at all.
>> In my experience this will just cause more difficulty down the road when
>> we decide to switch it back to voting. Things tend to bit rot fairly
>> quickly.
>>
>>>
>>> Of course, the other option is to try to skip the offending test if
>>> we're running with neutron support, which may help. Since we don't know
>>> what the problem is and it *seems* to be an issue with resources not
>>> becoming available before a timeout (AIUI), I worry that this will just
>>> move the problem elsewhere.
>>
>> So if it is a single test (or set of tests) failing then this is
>> doable. We
>> can do this in the short term, but if it just moves the problem
>> elsewhere then
>> we're just in the same situation right? So what's the harm in trying
>> this?
>
> Let's start with the test skip.
>
> I am however pretty frustrated that we're really not getting anyone from
> neutron looking at this. We're at 121 rechecks (plus I'm sure there were
> plenty of no bug rechecks, I've seen a couple). So 150+ gate resets
> because of this bug. Which is 150hrs worth of delay put into the gate.

Actually, I'm revising my point of view. If we skip the test, people 
can't debug in the gate. if we make the job non-voting, the neutron team 
can submit patches up and run rechecks on them to try to reproduce the fail.

So let's go non-voting here.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list