<div dir="ltr">Is it possible to make the depth of patches running tests in the gate very shallow during this high-probability of failure time? e.g. Allow only the top 4 to run tests and put the rest in the 'queued' state. Otherwise the already elevated probability of a patch failing is exacerbated by the fact that it gets retested every time a patch ahead of it in the queue fails. <div>


<br></div><div>--</div><div>Kevin Benton</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jun 5, 2014 at 5:07 AM, Sean Dague <span dir="ltr"><<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">You may all have noticed things are really backed up in the gate right<br>

now, and you would be correct. (Top of gate is about 30 hrs, but if you<br>

do the math on ingress / egress rates the gate is probably really double<br>

that in transit time right now).<br>

<br>

We've hit another threshold where there are so many really small races<br>

in the gate that they are compounding to the point where fixing one is<br>

often failed by another one killing your job. This whole situation was<br>

exacerbated by the fact that while the transition from HP cloud 1.0 -><br>

1.1 was happening and we were under capacity, the check queue grew to<br>

500 with lots of stuff being approved.<br>

<br>

That flush all hit the gate at once. But it also means that those jobs<br>

passed in a very specific timing situation, which is different on the<br>

new HP cloud nodes. And the normal statistical distribution of some jobs<br>

on RAX and some on HP that shake out different races didn't happen.<br>

<br>

At this point we could really use help getting focus on only recheck<br>

bugs. The current list of bugs is here:<br>

<a href="http://status.openstack.org/elastic-recheck/" target="_blank">http://status.openstack.org/elastic-recheck/</a><br>

<br>

Also our categorization rate is only 75% so there are probably at least<br>

2 critical bugs we don't even know about yet hiding in the failures.<br>

Helping categorize here -<br>

<a href="http://status.openstack.org/elastic-recheck/data/uncategorized.html" target="_blank">http://status.openstack.org/elastic-recheck/data/uncategorized.html</a><br>

would be handy.<br>

<br>

We're coordinating changes via an etherpad here -<br>

<a href="https://etherpad.openstack.org/p/gatetriage-june2014" target="_blank">https://etherpad.openstack.org/p/gatetriage-june2014</a><br>

<br>

If you want to help, jumping in #openstack-infra would be the place to go.<br>

<span class="HOEnZb"><font color="#888888"><br>

        -Sean<br>

<br>

--<br>

Sean Dague<br>

<a href="http://dague.net" target="_blank">http://dague.net</a><br>

<br>

</font></span><br>_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Kevin Benton</div>

</div>