[openstack-dev] flaky tempest -- Top Offenders

Sean Dague sean at dague.net
Fri Sep 27 11:30:21 UTC 2013


On 09/26/2013 09:41 PM, Joe Gordon wrote:
> Hi All,
>
> As many of you may have suspected the gate has gotten less stable in the
> past few days.  Turns out we have the numbers to prove it too!
>
> http://graphite.openstack.org/graphlot/?width=586&from=00%3A00_20130919&_salt=1380244287.508&height=308&target=summarize(stats_counts.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.FAILURE%2C%2224h%22)&target=summarize(stats_counts.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.SUCCESS%2C%2224h%22)&until=23%3A59_20130926&lineMode=staircase
>
> So tempest started failing more right around the 24th, even though we
> are in FeatureFreeze.
>
> "FF ensures that sufficient share of theReleaseCycle
> <https://wiki.openstack.org/wiki/ReleaseCycle>is dedicated to QA, until
> we produce the first release candidates. Limiting the changes that
> affect the behavior of the software allow for consistent testing and
> efficient bugfixing."
>
> https://wiki.openstack.org/wiki/FeatureFreeze
>
> Thanks to the work we have been doing with logstash and elastic-recheck
> we have very good numbers on the top offenders and when they began, the
> good news is there are two bugs which we are hitting the most, so the
> top offenders list has just two bugs. But there are still other unknown
> bugs and lower priority ones out there too!
>
>
> https://bugs.launchpad.net/tempest/+bug/1226337 -- Launchpad bug 1226337
> in tempest
> "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern flake
> failure" [High,Triaged]
>
> Started on 9-23 with  408 hits! in the last 24 hours alone
>
> http://logstash.openstack.org/#eyJzZWFyY2giOiJAbWVzc2FnZTpcIk5vdmFFeGNlcHRpb246IGlTQ1NJIGRldmljZSBub3QgZm91bmQgYXRcIiBBTkQgQGZpZWxkcy5idWlsZF9zdGF0dXM6XCJGQUlMVVJFXCIgQU5EIEBmaWVsZHMuZmlsZW5hbWU6XCJsb2dzL3NjcmVlbi1uLWNwdS50eHRcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4MDI0NDY2ODQ5Nn0=
>
>
> https://bugs.launchpad.net/tempest/+bug/1230407  -- Launchpad bug
> 1230407 in neutron "State change timeout exceeded" [Undecided,Confirmed]
>
> Started on 9-25 with 66 hits in the last 24 hours alone
>
> http://logstash.openstack.org/#eyJzZWFyY2giOiIgQG1lc3NhZ2U6XCJBc3NlcnRpb25FcnJvcjogU3RhdGUgY2hhbmdlIHRpbWVvdXQgZXhjZWVkZWQhXCIgQU5EIEBmaWVsZHMuYnVpbGRfc3RhdHVzOlwiRkFJTFVSRVwiIEFORCBAZmllbGRzLmZpbGVuYW1lOlwiY29uc29sZS5odG1sXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzODAyNDQ0MzM2NzZ9

This second one is looking like it's an issue with the Neutron DB layer, 
as it seems to like to deadlock itself on agent updates - 
http://logs.openstack.org/87/47487/4/check/gate-tempest-devstack-vm-neutron/4128a28/logs/screen-q-svc.txt.gz?level=TRACE

So DB assistance would be good.

I've set that bug to Critical and RC1 for Neutron, because right now 
it's bouncing at least 50% of the changes out of the gate (and as such 
we're starving out the check queue for devstack nodes, so no changes 
have made progress for 12 hrs over there).

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list