[openstack-dev] [all] OpenStack races piling up in the gate - please stop approving patches unless they are fixing a race condition

Kevin Benton blak111 at gmail.com
Thu Jun 5 22:29:17 UTC 2014


Is it possible to make the depth of patches running tests in the gate very
shallow during this high-probability of failure time? e.g. Allow only the
top 4 to run tests and put the rest in the 'queued' state. Otherwise the
already elevated probability of a patch failing is exacerbated by the fact
that it gets retested every time a patch ahead of it in the queue fails.

--
Kevin Benton


On Thu, Jun 5, 2014 at 5:07 AM, Sean Dague <sean at dague.net> wrote:

> You may all have noticed things are really backed up in the gate right
> now, and you would be correct. (Top of gate is about 30 hrs, but if you
> do the math on ingress / egress rates the gate is probably really double
> that in transit time right now).
>
> We've hit another threshold where there are so many really small races
> in the gate that they are compounding to the point where fixing one is
> often failed by another one killing your job. This whole situation was
> exacerbated by the fact that while the transition from HP cloud 1.0 ->
> 1.1 was happening and we were under capacity, the check queue grew to
> 500 with lots of stuff being approved.
>
> That flush all hit the gate at once. But it also means that those jobs
> passed in a very specific timing situation, which is different on the
> new HP cloud nodes. And the normal statistical distribution of some jobs
> on RAX and some on HP that shake out different races didn't happen.
>
> At this point we could really use help getting focus on only recheck
> bugs. The current list of bugs is here:
> http://status.openstack.org/elastic-recheck/
>
> Also our categorization rate is only 75% so there are probably at least
> 2 critical bugs we don't even know about yet hiding in the failures.
> Helping categorize here -
> http://status.openstack.org/elastic-recheck/data/uncategorized.html
> would be handy.
>
> We're coordinating changes via an etherpad here -
> https://etherpad.openstack.org/p/gatetriage-june2014
>
> If you want to help, jumping in #openstack-infra would be the place to go.
>
>         -Sean
>
> --
> Sean Dague
> http://dague.net
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Kevin Benton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140605/e361b414/attachment.html>


More information about the OpenStack-dev mailing list