[openstack-dev] [qa] [neutron] Neutron Full Parallel job very close to voting - call to arms by neutron team

Salvatore Orlando sorlando at nicira.com
Mon Feb 24 13:18:22 UTC 2014


Hi Rossella,

I had no idea most of the bugs were assigned to me.

I have pushed several patches for bug 1253896 and that's why launchpad is
stating I "own" the ticket.
But if you find another fault causing that bug, feel free to push a patch
for it.
I think today I will push only a patch for bug 1283518, I won't be able to
work on any other of them, so feel free to pick all the bugs you want!

I will ensure I de-assign myself from all the other bugs. It would be a
shame if contributors are turned away because of this!

Salvatore

PS: the correct link is https://bugs.launchpad.net/neutron/+bug/1283533



On 24 February 2014 11:14, Rossella Sblendido <rsblendido at suse.com> wrote:

>  Ciao Salvatore,
>
> thanks a lot for analyzing the failures!
>
> This link is not working for me:
> 7) https://bugs.launchpad.net/neutron/+bug/1253533
>
> I took a minor bug that was not assigned. Most of the bugs are assigned to
> you, I was wondering if you´d use some help. I guess we can coordinate
> better when you are online.
>
> cheers,
>
> Rossella
>
>
> On 02/23/2014 03:14 AM, Salvatore Orlando wrote:
>
> I have tried to collect more information on neutron full job failures.
>
>  So far there have been 219 failures and 891 successes, for an overall
> success rate of 19.8% which is inline with Sean's evaluation.
> The count has performed exclusively on jobs executed against master
> branch. The failure rate for stable/havana is higher; indeed the job there
> still triggers bug 1273386 as it performs nbd mounting, and several fixes
> for the l2/l3 agents were not backported (or not backportable).
>
>  It is worth noting that actually some of the failures were because of
> infra issues. Unfortunately, it is not obvious to me how to define a
> logstash query for that. Nevertheless, it will be better to err on the side
> of safety and estimate failure rate to be about 20%.
>
>  I did then a classification of 63 failures, finding out the following:
> - 25 failures were for infra issues, 1 failure was due to a flaw in a
> patch, leaving 37 "real" failures to analyse
>    * In the same timeframe 203 jobs succeeded, giving a potential failure
> rate after excluding infra issues of 15.7%
> - 2 bugs were responsible for 25 of these 37 failures
>    * they are the "SSH protocol banner issue", and the well-knows DB lock
> timeouts
> - bug 1253896 (the infamous SSH timeout bug) was hit only twice. The
> elastic recheck count is much higher because failures for the SSH protocol
> banner error (1265495) are being classified as bug 1253896.
>    * actually in the past 48 hours only 2 voting neutron jobs hit this
> failure. This is probably a great improvement compared with a few weeks ago.
> - Some failures are due to bug already known and tracked, other failures
> are due to bugs either unforeseen so far or not tracked. In the latter case
> a bug report has been filed.
>
>  It seems therefore that there are two high priority bugs to address:
> 1) https://bugs.launchpad.net/neutron/+bug/1283522 (16 occurrences, 43.2%
> of failure, 6.67% globally)
>      * Check whether we can resume the split between API server and RPC
> server discussion)
> 2) https://bugs.launchpad.net/neutron/+bug/1265495 (9/37 = 24.3% of
> failures, 3.75% globally)
>
>  And several minor bugs (affecting tempest and/or neutron)
> Each one of the following bugs was found no more than twice in our
> analysis:
> 3) https://bugs.launchpad.net/neutron/+bug/1254890 (possibly a nova bug,
> but it hit the neutron full job once)
> 4) https://bugs.launchpad.net/neutron/+bug/1283599
> 5) https://bugs.launchpad.net/neutron/+bug/1277439
> 6) https://bugs.launchpad.net/neutron/+bug/1253896
> 7) https://bugs.launchpad.net/neutron/+bug/1253533
>  8) https://bugs.launchpad.net/tempest/+bug/1283535 (possibly not a
> neutron bug)
> 9) https://bugs.launchpad.net/tempest/+bug/1253993 (need to devise new
> solutions for improving agent loop times)
>    * there is already a patch under review for bulking device details
> requests
> 10) https://bugs.launchpad.net/neutron/+bug/1283518
>
>  In my humble opinion, it is therefore important to have immediately a
> plan for ensuring bugs #1 and #2 are solved or at least consistently
> mitigated by icehouse. It would also be good to identify assignees for bug
> #3 to bug #10.
>
>  Regards,
> Salvatore
>
>
> On 21 February 2014 14:44, Sean Dague <sean at dague.net> wrote:
>
>> Yesterday during the QA meeting we realized that the neutron full job,
>> which includes tenant isolation, and full parallelism, was passing quite
>> often in the experimental queue. Which was actually news to most of us,
>> as no one had been keeping a close eye on it.
>>
>> I moved that to a non-voting job on all projects. A spot check overnight
>> is that it's failing about twice as often as the regular neutron job.
>> Which is too high a failure rate to make it voting, but it's close.
>>
>> This would be the time for a final hard push by the neutron team to get
>> to the bottom of these failures to bring the pass rate to the level of
>> the existing neutron job, then we could make neutron full voting.
>>
>> This is a *huge* move forward from where things were at the Havana
>> summit. I want to thank the Neutron team for getting so aggressive about
>> getting this testing working. I was skeptical we could get there within
>> the cycle, but a last push could actually get us neutron parity in the
>> gate by i3.
>>
>>         -Sean
>>
>> --
>> Sean Dague
>> Samsung Research America
>> sean at dague.net / sean.dague at samsung.com
>> http://dague.net
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> _______________________________________________
> OpenStack-dev mailing listOpenStack-dev at lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140224/f9b7e045/attachment.html>


More information about the OpenStack-dev mailing list