<div dir="ltr"><div id="magicdomid1408" class="" style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Hi All,</span></span></div><div id="magicdomid1408" class="" style="margin:0px;padding:0px">
<span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px"><br></span></span></div><div id="magicdomid1408" class="" style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">TL;DR Last week the gate got wedged on nondeterministic failures. Unwedging the gate required drastic actions to fix bugs.</span></span></div>
<div id="magicdomid1408" class="" style="margin:0px;padding:0px"><br></div><div id="magicdomid1408" class="" style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px"><font color="#000000" face="Arial, sans-serif"><span style="font-size:12px;line-height:16px">Starting on November 15th, gate jobs have been getting progressively less stable with not enough attention given to fixing the issues, until we got to the point where the gate was almost fully wedged. No one bug caused this, it was a collection of bugs that got us here. The gate protects us from code that fails 100% of the time, but if a patch fails 10% of the time it can slip through. Add a few of these bugs together and we get the gate to a point where the gate is fully wedged and fixing it without circumventing the gate (something</span></font></span><span class="" style="color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px;margin:0px;padding:0px 0px 1px"> we</span><span class="" style="color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px;margin:0px;padding:0px 0px 1px"> never want to do) is very hard. It took just 2 new nondeterministic bugs to take us from a gate that mostly worked, to a gate that was almost fully wedged. Last week we found out Jeremy Stanley (fungi) was right when he said, "nondeterministic failures breed more nondeterministic failures, because people are so used to having to reverify their patches to get them to merge that they are doing so </span><span class="" style="color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px;margin:0px;padding:0px 0px 1px">e</span><span class="" style="color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px;margin:0px;padding:0px 0px 1px">ven when it's their patch which is introducing a nondeterministic bug."</span></span></div>
<div id="magicdomid224" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid289" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Side note: This is not the first time we wedge the gate, the first time was around September 26th, right when we were cutting Havana release candidates. In response we wrote elastic-recheck (</span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="http://status.openstack.org/elastic-recheck/" style="margin:0px;padding:0px">http://status.openstack.org/elastic-recheck/</a></span><span class="" style="margin:0px;padding:0px 0px 1px">) to better track what bugs we were seeing.</span></span></div>
<div id="magicdomid58" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid280" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Gate stability according to Graphite: </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="http://paste.openstack.org/show/53765/" style="margin:0px;padding:0px">http://paste.openstack.org/show/53765/</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> (they are huge</span><span class="" style="margin:0px;padding:0px 0px 1px"> because they encode entire queries,</span><span class="" style="margin:0px;padding:0px 0px 1px"> so including as a pastebin).</span></span></div>
<div id="magicdomid60" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid837" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">After sending out an email to ask for help fixing the top known gate bugs (</span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html" style="margin:0px;padding:0px">http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html</a></span><span class="" style="margin:0px;padding:0px 0px 1px">), we had a few possible fixes. But with the gate wedged, the merge queue was 145 patches long and could take days to be processed. In the worst case, none of the patches merging, it would take about 1 hour per patch. So on November 20th we asked for a freeze on any non-critical bug fixes </span><span class="" style="margin:0px;padding:0px 0px 1px">( </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html" style="margin:0px;padding:0px">http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> )</span><span class="" style="margin:0px;padding:0px 0px 1px">, and kicked everything out of the merge queue and put our possible bug fixes at the front. Even with these drastic measures it still took 26 hours to finally unwedge the gate. In 26 hours we got the check queue failure rate (always higher then the gate failure rate) down from around 87% failure to below 10% failure. And we still have many more bugs to track down and fix in order to improve gate stability.</span></span></div>
<div id="magicdomid62" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid63" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid1409" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">8 Major bug fixes later, we have the gate back to a reasonable failure rate. But how did things get so bad? I'm glad you asked, here is a blow by blow account.</span></div>
<div id="magicdomid1411" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid1428" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">The gate has not been completely stable for a very long time, and it only took two new bugs to wedge the gate. Starting with the list of bugs we identified via elastic-recheck, we fixed 4 bugs that have been in the gate for a few weeks already.</span></div>
<div id="magicdomid65" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid66" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid537" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 1.5em;padding:0px"><li style="margin:0px;padding:0px">
<span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://bugs.launchpad.net/bugs/1224001" style="margin:0px;padding:0px">https://bugs.launchpad.net/bugs/1224001</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> "test_network_basic_ops fails waiting for network to become available"</span></span></li>
</ul></div><div id="magicdomid538" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/57290" style="margin:0px;padding:0px">https://review.openstack.org/57290</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> was the fix which depended on </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/53188" style="margin:0px;padding:0px">https://review.openstack.org/53188</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> and </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/57475" style="margin:0px;padding:0px">https://review.openstack.org/57475</a></span><span class="" style="margin:0px;padding:0px 0px 1px">.</span></span></li>
</ul></div><div id="magicdomid539" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">This fixed a race condition where the IP address from DHCP was not received by the VM at the right time. Minimize polling on the agent is now defaulted to True, which should reduce the time needed for configuring an interface on br-int consistently.</span></li>
</ul></div><div id="magicdomid540" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 1.5em;padding:0px"><li style="margin:0px;padding:0px">
<span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://bugs.launchpad.net/bugs/1252514" style="margin:0px;padding:0px">https://bugs.launchpad.net/bugs/1252514</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> "Swift returning errors when setup using devstack"</span></span></li>
</ul></div><div id="magicdomid541" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Fix </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/#/c/57373/" style="margin:0px;padding:0px">https://review.openstack.org/#/c/57373/</a></span></span></li>
</ul></div><div id="magicdomid542" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">There were a few swift related problems that were sorted out as well. Most had to do with tuning swift properly for its use as a glance backend in the gate, ensuring that timeout values were appropriate for the devstack test slaves (in</span></li>
</ul></div><div id="magicdomid543" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">resource constrained environments, the swift default timeouts could be tripped frequently (logs showed the request would have finished successfully given enough time)). Swift also had a race-condition in how it constructed its sqlite3</span></li>
</ul></div><div id="magicdomid544" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">files for containers and accounts, where it was not retrying operations when the database was locked.</span></li>
</ul></div><div id="magicdomid545" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 1.5em;padding:0px"><li style="margin:0px;padding:0px">
<span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://bugs.launchpad.net/swift/+bug/1243973" style="margin:0px;padding:0px">https://bugs.launchpad.net/swift/+bug/1243973</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> "Simultaneous PUT requests for the same account..."</span></span></li>
</ul></div><div id="magicdomid546" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Fix </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/#/c/57019/" style="margin:0px;padding:0px">https://review.openstack.org/#/c/57019/</a></span></span></li>
</ul></div><div id="magicdomid547" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">This was not on our original list of bugs, but while in bug fix mode, we got this one fixed as well</span></li>
</ul></div><div id="magicdomid548" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 1.5em;padding:0px"><li style="margin:0px;padding:0px">
<span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://bugs.launchpad.net/bugs/1251784" style="margin:0px;padding:0px">https://bugs.launchpad.net/bugs/1251784</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> "nova+neutron scheduling error: Connection to neutron failed: Maximum attempts reached</span></span></li>
</ul></div><div id="magicdomid549" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Fix </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/#/c/57509/" style="margin:0px;padding:0px">https://review.openstack.org/#/c/57509/</a></span></span></li>
</ul></div><div id="magicdomid550" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Uncovered on mailing list (</span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="http://lists.openstack.org/pipermail/openstack-dev/2013-November/019906.html)" style="margin:0px;padding:0px">http://lists.openstack.org/pipermail/openstack-dev/2013-November/019906.html)</a></span></span></li>
</ul></div><div id="magicdomid551" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">Nova had a very old version of oslo's local.py which is used for managing references to local variables in coroutines. The old version had a pretty significant bug that basically meant non-weak references to variables were not managed properly. This fix has made the nova neutron interactions much more reliable.</span></li>
</ul></div><div id="magicdomid554" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">This fixed the number 2 bug on our list of top gate bugs (</span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html" style="margin:0px;padding:0px">http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> )!</span></span></li>
</ul></div><div id="magicdomid83" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid1434" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">In addition to fixing 4 old bugs, we fixed two new bugs that were introduced / exposed this week.</span></div>
<div id="magicdomid1436" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid2804" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 1.5em;padding:0px"><li style="margin:0px;padding:0px">
<span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://bugs.launchpad.net/bugs/1251920" style="margin:0px;padding:0px">https://bugs.launchpad.net/bugs/1251920</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> "Tempest failures due to failure to return console logs from an instance Project"</span></span></li>
</ul></div><div id="magicdomid2805" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Bug: </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/#/c/54363/" style="margin:0px;padding:0px">https://review.openstack.org/#/c/54363/</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> [Tempest]</span></span></li>
</ul></div><div id="magicdomid2806" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Fix(work around): </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/#/c/57193/" style="margin:0px;padding:0px">https://review.openstack.org/#/c/57193/</a></span></span></li>
</ul></div><div id="magicdomid2807" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">After many false starts and banging our head against the wall, w</span><span class="" style="margin:0px;padding:0px 0px 1px">e identif</span><span class="" style="margin:0px;padding:0px 0px 1px">ied</span><span class="" style="margin:0px;padding:0px 0px 1px"> a change to tempest, </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/54363" style="margin:0px;padding:0px">https://review.openstack.org/54363</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">, that added a new test around</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">the same time as bug 1251920 became a problem. Forcing tempest to skip </span><span class="" style="margin:0px;padding:0px 0px 1px"></span><span class="" style="margin:0px;padding:0px 0px 1px">this test had a very high incidence of success without any 1251920 related failures.</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">As a result we are working arond this bug by skipping that test, until it can be run without major impact to the gate.</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span></span></li>
</ul></div><div id="magicdomid2816" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">The change that introduced this problematic test had to go through the gate</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">four times before it would merge, though only one of the 3 failed attemps appears</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">to have triggered 1251920.</span><span class="" style="margin:0px;padding:0px 0px 1px"> Or as Jeremy Stanley (fungi) said "nondeterministic failures breed more nondeterministic failures, because people are so used to having to reverify their patches to get them to merge that they are doing so </span><span class="" style="margin:0px;padding:0px 0px 1px">e</span><span class="" style="margin:0px;padding:0px 0px 1px">ven when it's their patch which is introducing a nondeterministic bug."</span></span></li>
</ul></div><div id="magicdomid2809" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 1.5em;padding:0px"><li style="margin:0px;padding:0px">
<span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://bugs.launchpad.net/bugs/1252170" style="margin:0px;padding:0px">https://bugs.launchpad.net/bugs/1252170</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> "tempest.scenario test_resize_server_confirm failed in grenade"</span></span></li>
</ul></div><div id="magicdomid2810" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Fix </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/#/c/57357/" style="margin:0px;padding:0px">https://review.openstack.org/#/c/57357/</a></span></span></li>
</ul></div><div id="magicdomid2811" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Fix </span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="https://review.openstack.org/#/c/57572/" style="margin:0px;padding:0px">https://review.openstack.org/#/c/57572/</a></span></span></li>
</ul></div><div id="magicdomid2812" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 3em;padding:0px;list-style-type:circle">
<li style="margin:0px;padding:0px"><span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">First we started running post Grenade upgrade tests in parallel</span><span class="" style="margin:0px;padding:0px 0px 1px"> (to fix another bug)</span><span class="" style="margin:0px;padding:0px 0px 1px"> which would</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">normally be fine, but Grenade wasn't configuring the small flavors typically</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">used by tempest so it was possible for the devstack Jenkins slaves to run</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">out of memory when starting many larger VMs in parallel.</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">To fix this devstack lib/tempest has been updated to create the flavors only if</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">they don't exist and Grenade is allowing tempest to use its default instance</span><span class="" style="margin:0px;padding:0px 0px 1px"> </span><span class="" style="margin:0px;padding:0px 0px 1px">flavors.</span></span></li>
</ul></div><div id="magicdomid85" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid3154" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span style="background-color:rgb(255,255,255)"><br style="margin:0px;padding:0px">
</span></div><div id="magicdomid4443" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">Now that we have the gate back into working order, we are working on the next steps to prevent this from happening again. The two most immediate changes are:</span></div>
<div id="magicdomid4455" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 1.5em;padding:0px"><li style="margin:0px;padding:0px">
<span style="background-color:rgb(255,255,255)"><span class="" style="margin:0px;padding:0px 0px 1px">Doing a better job of triaging gate bugs (</span><span class="" style="margin:0px;padding:0px 0px 1px"><a href="http://lists.openstack.org/pipermail/openstack-dev/2013-November/020048.html" style="margin:0px;padding:0px">http://lists.openstack.org/pipermail/openstack-dev/2013-November/020048.html</a></span><span class="" style="margin:0px;padding:0px 0px 1px"> ). </span></span></li>
</ul></div><div id="magicdomid4456" class="" style="margin:0px;padding:0px;color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:16px"><ul class="" style="margin:0px 0px 0px 1.5em;padding:0px"><li style="margin:0px;padding:0px">
<span class="" style="margin:0px;padding:0px 0px 1px;background-color:rgb(255,255,255)">In the next few days we will remove 'reverify no bug' (although you will still be able to run 'reverify bug x'.</span></li>
</ul><div><br></div><div>Best,</div><div>Joe Gordon</div><div>Clark Boylan</div><div></div></div></div>