[openstack-dev] Top Gate Bugs

Davanum Srinivas davanum at gmail.com
Fri Dec 6 21:31:11 UTC 2013


Joe,

Looks like we may be a bit more stable now?

Short URL: http://bit.ly/18qq4q2

Long URL : http://graphite.openstack.org/graphlot/?from=-120hour&until=-0hour&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-postgres-full'),'ED9121')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00F0F0')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron'),'00FF00')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00c868')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.check-grenade-dsvm.SUCCESS,sum(stats.zuul.pipeline.check.job.check-grenade-dsvm.{SUCCESS,FAILURE})),'6hours'),%20'check-grenade-dsvm'),'800080')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'E080FF')

-- dims


On Fri, Dec 6, 2013 at 11:28 AM, Matt Riedemann
<mriedem at linux.vnet.ibm.com> wrote:
>
>
> On Wednesday, December 04, 2013 7:22:23 AM, Joe Gordon wrote:
>>
>> TL;DR: Gate is failing 23% of the time due to bugs in nova, neutron
>> and tempest. We need help fixing these bugs.
>>
>>
>> Hi All,
>>
>> Before going any further we have a bug that is affecting gate and
>> stable, so its getting top priority here. elastic-recheck currently
>> doesn't track unit tests because we don't expect them to fail very
>> often. Turns out that assessment was wrong, we now have a nova py27
>> unit test bug in gate and stable gate.
>>
>> https://bugs.launchpad.net/nova/+bug/1216851
>> Title: nova unit tests occasionally fail migration tests for mysql and
>> postgres
>> Hits
>>   FAILURE: 74
>> The failures appear multiple times for a single job, and some of those
>> are due to bad patches in the check queue.  But this is being seen in
>> stable and trunk gate so something is definitely wrong.
>>
>> =======
>>
>>
>> Its time for another edition of of 'Top Gate Bugs.'  I am sending this
>> out now because in addition to our usual gate bugs a few new ones have
>> cropped up recently, and as we saw a few weeks ago it doesn't take
>> very many new bugs to wedge the gate.
>>
>> Currently the gate has a failure rate of at least 23%! [0]
>>
>> Note: this email was generated with
>> http://status.openstack.org/elastic-recheck/ and
>> 'elastic-recheck-success' [1]
>>
>> 1) https://bugs.launchpad.net/bugs/1253896
>> Title: test_minimum_basic_scenario fails with SSHException: Error
>> reading SSH protocol banner
>> Projects:  neutron, nova, tempest
>> Hits
>>   FAILURE: 324
>> This one has been around for several weeks now and although we have
>> made some attempts at fixing this, we aren't any closer at resolving
>> this then we were a few weeks ago.
>>
>> 2) https://bugs.launchpad.net/bugs/1251448
>> Title: BadRequest: Multiple possible networks found, use a Network ID
>> to be more specific.
>> Project: neutron
>> Hits
>>   FAILURE: 141
>>
>> 3) https://bugs.launchpad.net/bugs/1249065
>> Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py
>> Project: nova
>> Hits
>>   FAILURE: 112
>> This is a bug in nova's neutron code.
>>
>> 4) https://bugs.launchpad.net/bugs/1250168
>> Title: gate-tempest-devstack-vm-neutron-large-ops is failing
>> Projects: neutron, nova
>> Hits
>>   FAILURE: 94
>> This is an old bug that was fixed, but came back on December 3rd. So
>> this is a recent regression. This may be an infra issue.
>>
>> 5) https://bugs.launchpad.net/bugs/1210483
>> Title: ServerAddressesTestXML.test_list_server_addresses FAIL
>> Projects: neutron, nova
>> Hits
>>   FAILURE: 73
>> This has had some attempts made at fixing it but its still around.
>>
>>
>> In addition to the existing bugs, we have some new bugs on the rise:
>>
>> 1) https://bugs.launchpad.net/bugs/1257626
>> Title: Timeout while waiting on RPC response - topic: "network", RPC
>> method: "allocate_for_instance" info: "<unknown>"
>> Project: nova
>> Hits
>>   FAILURE: 52
>> large-ops only bug. This has been around for at least two weeks, but
>> we have seen this in higher numbers starting around December 3rd. This
>> may  be an infrastructure issue as the neutron-large-ops started
>> failing more around the same time.
>>
>> 2) https://bugs.launchpad.net/bugs/1257641
>> Title: Quota exceeded for instances: Requested 1, but already used 10
>> of 10 instances
>> Projects: nova, tempest
>> Hits
>>   FAILURE: 41
>> Like the previous bug, this has been around for at least two weeks but
>> appears to be on the rise.
>>
>>
>>
>> Raw Data: http://paste.openstack.org/show/54419/
>>
>>
>> best,
>> Joe
>>
>>
>> [0] failure rate = 1-(success rate gate-tempest-dsvm-neutron)*(success
>> rate ...) * ...
>>
>> gate-tempest-dsvm-neutron = 0.00
>> gate-tempest-dsvm-neutron-large-ops = 11.11
>> gate-tempest-dsvm-full = 11.11
>> gate-tempest-dsvm-large-ops = 4.55
>> gate-tempest-dsvm-postgres-full = 10.00
>> gate-grenade-dsvm = 0.00
>>
>> (I hope I got the math right here)
>>
>> [1]
>>
>> http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/elastic_recheck/cmd/check_success.py
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> Let's add bug 1257644 [1] to the list.  I'm pretty sure this is due to some
> recent code [2][3] in the nova libvirt driver that is automatically
> disabling the host when the libvirt connection drops.
>
> Joe said there was a known issue with libvirt connection failures so this
> could be duped against that, but I'm not sure where/what that one is - maybe
> bug 1254872 [4]?
>
> Unless I just don't understand the code, there is some funny logic going on
> in the libvirt driver when it's automatically disabling a host which I've
> documented in bug 1257644.  It would help to have some libvirt-minded people
> helping to look at that, or the authors/approvers of those patches.
>
> Also, does anyone know if libvirt will pass a 'reason' string to the
> _close_callback function?  I was digging through the libvirt code this
> morning but couldn't figure out where the callback is actually called and
> with what parameters.  The code in nova seemed to just be based on the patch
> that danpb had in libvirt [5].
>
> This bug is going to raise a bigger long-term question about the need for
> having a new column in the Service table for indicating whether or not the
> service was automatically disabled, as Phil Day points out in bug 1250049
> [6].  That way the ComputeFilter in the scheduler could handle that case a
> bit differently, at least from a logging/serviceability standpoint, e.g.
> info/warning level message vs debug.
>
> [1] https://bugs.launchpad.net/nova/+bug/1257644
> [2] https://review.openstack.org/#/c/52189/
> [3] https://review.openstack.org/#/c/56224/
> [4] https://bugs.launchpad.net/nova/+bug/1254872
> [5] http://www.redhat.com/archives/libvir-list/2012-July/msg01675.html
> [6] https://bugs.launchpad.net/nova/+bug/1250049
>
> --
>
> Thanks,
>
> Matt Riedemann
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Davanum Srinivas :: http://davanum.wordpress.com



More information about the OpenStack-dev mailing list