[openstack-dev] [all] OpenStack races piling up in the gate - please stop approving patches unless they are fixing a race condition

Matt Riedemann mriedem at linux.vnet.ibm.com
Thu Jun 5 16:12:39 UTC 2014



On 6/5/2014 7:07 AM, Sean Dague wrote:
> You may all have noticed things are really backed up in the gate right
> now, and you would be correct. (Top of gate is about 30 hrs, but if you
> do the math on ingress / egress rates the gate is probably really double
> that in transit time right now).
>
> We've hit another threshold where there are so many really small races
> in the gate that they are compounding to the point where fixing one is
> often failed by another one killing your job. This whole situation was
> exacerbated by the fact that while the transition from HP cloud 1.0 ->
> 1.1 was happening and we were under capacity, the check queue grew to
> 500 with lots of stuff being approved.
>
> That flush all hit the gate at once. But it also means that those jobs
> passed in a very specific timing situation, which is different on the
> new HP cloud nodes. And the normal statistical distribution of some jobs
> on RAX and some on HP that shake out different races didn't happen.
>
> At this point we could really use help getting focus on only recheck
> bugs. The current list of bugs is here:
> http://status.openstack.org/elastic-recheck/
>
> Also our categorization rate is only 75% so there are probably at least
> 2 critical bugs we don't even know about yet hiding in the failures.
> Helping categorize here -
> http://status.openstack.org/elastic-recheck/data/uncategorized.html
> would be handy.
>
> We're coordinating changes via an etherpad here -
> https://etherpad.openstack.org/p/gatetriage-june2014
>
> If you want to help, jumping in #openstack-infra would be the place to go.
>
> 	-Sean
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

There are a lot of projects with 0% classification rates on unit test 
job failures, I at least saw Ironic, Horizon, Cinder and Glance.

Ceilometer UT jobs had a 14% classification rate but I think Anita and I 
have most of those fingerprinted now.

This is hoping that people from those other projects and step up and 
help classify their racy unit test bugs, or at least open bugs so we can 
find logstash queries and get elastic-recheck patches up for them.

If you have questions on writing e-r patches, ping me in #openstack-qa.

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list