<div dir="ltr">I've found out that several jobs are exhibiting failures like bug 1254890 [1] and bug 1253896 [2] because openvswitch seem to be crashing the kernel.<div>The kernel trace reports as offending process usually either neutron-ns-metadata-proxy or dnsmasq, but [3] seem to clearly point to ovs-vsctl.</div>
<div>254 events observed in the previous 6 days show a similar trace in the logs [4].</div><div>This means that while this alone won't explain all the failures observed, it is however potentially one of the prominent root causes.</div>
<div><br></div><div>From the logs I have little hints about the kernel running. It seems there has been no update in the past 7 days, but I can't be sure.</div><div>Openvswitch builds are updated periodically. The last build I found not to trigger failures was the one generated on 2014/01/16 at 01:58:18. Unfortunately version-wise I always have only 1.4.0, no build number.</div>
<div><br></div><div>I don't know if this will require getting in touch with ubuntu, or if we can just prep a different image which an OVS build known to work without problems.</div><div><br></div><div>Salvatore</div><div>
<br></div><div>[1] <a href="https://bugs.launchpad.net/neutron/+bug/1254890">https://bugs.launchpad.net/neutron/+bug/1254890</a></div><div>[2] <a href="https://bugs.launchpad.net/neutron/+bug/1253896">https://bugs.launchpad.net/neutron/+bug/1253896</a></div>
<div>[3] <a href="http://paste.openstack.org/show/61869/">http://paste.openstack.org/show/61869/</a></div><div>[4] "kernel BUG at /build/buildd/linux-3.2.0/fs/buffer.c:2917" and filename:syslog.txt</div></div><div class="gmail_extra">
<br><br><div class="gmail_quote">On 24 January 2014 21:13, Clay Gerrard <span dir="ltr"><<a href="mailto:clay.gerrard@gmail.com" target="_blank">clay.gerrard@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">OH yeah that's much better. I had found those eventually but had to dig through all that other stuff :'(<div><br></div><div>Moving forward I think we can keep an eye on that page, open bugs for those tests causing issue and dig in.<br>
</div><div><div><br></div><div>Thanks again!</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>-Clay</div></font></span></div></div><div class="gmail_extra"><br><br><div class="gmail_quote"><div><div class="h5">
On Fri, Jan 24, 2014 at 11:37 AM, Sean Dague <span dir="ltr"><<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>></span> wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div><div>On 01/24/2014 02:02 PM, Peter Portante wrote:<br>
> Hi Sean,<br>
><br>
> In the last 7 days I see only 6 python27 based test<br>
> failures: <a href="http://logstash.openstack.org/#eyJzZWFyY2giOiJwcm9qZWN0Olwib3BlbnN0YWNrL3N3aWZ0XCIgQU5EIGJ1aWxkX3F1ZXVlOmdhdGUgQU5EIGJ1aWxkX25hbWU6Z2F0ZS1zd2lmdC1weXRob24qIEFORCBtZXNzYWdlOlwiRVJST1I6ICAgcHkyNzogY29tbWFuZHMgZmFpbGVkXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA1ODk2Mjk0MDR9" target="_blank">http://logstash.openstack.org/#eyJzZWFyY2giOiJwcm9qZWN0Olwib3BlbnN0YWNrL3N3aWZ0XCIgQU5EIGJ1aWxkX3F1ZXVlOmdhdGUgQU5EIGJ1aWxkX25hbWU6Z2F0ZS1zd2lmdC1weXRob24qIEFORCBtZXNzYWdlOlwiRVJST1I6ICAgcHkyNzogY29tbWFuZHMgZmFpbGVkXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA1ODk2Mjk0MDR9</a><br>
><br>
> And 4 python26 based test<br>
> failures: <a href="http://logstash.openstack.org/#eyJzZWFyY2giOiJwcm9qZWN0Olwib3BlbnN0YWNrL3N3aWZ0XCIgQU5EIGJ1aWxkX3F1ZXVlOmdhdGUgQU5EIGJ1aWxkX25hbWU6Z2F0ZS1zd2lmdC1weXRob24qIEFORCBtZXNzYWdlOlwiRVJST1I6ICAgcHkyNjogY29tbWFuZHMgZmFpbGVkXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA1ODk1MzAzNTd9" target="_blank">http://logstash.openstack.org/#eyJzZWFyY2giOiJwcm9qZWN0Olwib3BlbnN0YWNrL3N3aWZ0XCIgQU5EIGJ1aWxkX3F1ZXVlOmdhdGUgQU5EIGJ1aWxkX25hbWU6Z2F0ZS1zd2lmdC1weXRob24qIEFORCBtZXNzYWdlOlwiRVJST1I6ICAgcHkyNjogY29tbWFuZHMgZmFpbGVkXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA1ODk1MzAzNTd9</a><br>
><br>
> Maybe the query you posted captures failures where the job did not even run?<br>
><br>
> And only 15 hits (well, 18, but three are within the same job, and some<br>
> of the tests are run twice, so it is a combined of 10<br>
> hits): <a href="http://logstash.openstack.org/#eyJzZWFyY2giOiJwcm9qZWN0Olwib3BlbnN0YWNrL3N3aWZ0XCIgQU5EIGJ1aWxkX3F1ZXVlOmdhdGUgQU5EIGJ1aWxkX25hbWU6Z2F0ZS1zd2lmdC1weXRob24qIEFORCBtZXNzYWdlOlwiRkFJTDpcIiBhbmQgbWVzc2FnZTpcInRlc3RcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MDU4OTg1NTAzMX0=" target="_blank">http://logstash.openstack.org/#eyJzZWFyY2giOiJwcm9qZWN0Olwib3BlbnN0YWNrL3N3aWZ0XCIgQU5EIGJ1aWxkX3F1ZXVlOmdhdGUgQU5EIGJ1aWxkX25hbWU6Z2F0ZS1zd2lmdC1weXRob24qIEFORCBtZXNzYWdlOlwiRkFJTDpcIiBhbmQgbWVzc2FnZTpcInRlc3RcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MDU4OTg1NTAzMX0=</a><br>
><br>
><br>
> Thanks,<br>
<br>
</div></div>So it is true, that the Interupted exceptions (which is when a job is<br>
killed because of a reset) are some times being turned into Fail events<br>
by the system, which is one of the reasons the graphite data for<br>
failures is incorrect, and if you use just the graphite sourcing for<br>
fails, your numbers will be overly pessimistic.<br>
<br>
The following is probably better lists<br>
-<br>
<a href="http://status.openstack.org/elastic-recheck/data/uncategorized.html#gate-swift-python26" target="_blank">http://status.openstack.org/elastic-recheck/data/uncategorized.html#gate-swift-python26</a><br>
(7 uncategorized fails)<br>
-<br>
<a href="http://status.openstack.org/elastic-recheck/data/uncategorized.html#gate-swift-python27" target="_blank">http://status.openstack.org/elastic-recheck/data/uncategorized.html#gate-swift-python27</a><br>
(5 uncategorized fails)<br>
<div><div><br>
-Sean<br>
<br>
--<br>
Sean Dague<br>
Samsung Research America<br>
<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a> / <a href="mailto:sean.dague@samsung.com" target="_blank">sean.dague@samsung.com</a><br>
<a href="http://dague.net" target="_blank">http://dague.net</a><br>
<br>
</div></div><br></div></div><div class="im">_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></div></blockquote></div><br></div>
<br>_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br></div>