[openstack-dev] [qa] [neutron] Neutron Full Parallel job - Last 4 days failures

Eugene Nikanorov enikanorov at mirantis.com
Sat Mar 29 16:36:11 UTC 2014


Bug 1294603 has the root cause in LBaaS, which should be fixed by
https://review.openstack.org/#/c/81537/

Thanks,
Eugene.


On Fri, Mar 28, 2014 at 7:29 PM, Matt Riedemann
<mriedem at linux.vnet.ibm.com>wrote:

>
>
> On 3/27/2014 8:00 AM, Salvatore Orlando wrote:
>
>>
>> On 26 March 2014 19:19, James E. Blair <jeblair at openstack.org
>> <mailto:jeblair at openstack.org>> wrote:
>>
>>     Salvatore Orlando <sorlando at nicira.com <mailto:sorlando at nicira.com>>
>>
>>     writes:
>>
>>      > On another note, we noticed that the duplicated jobs currently
>>     executed for
>>      > redundancy in neutron actually seem to point all to the same
>>     build id.
>>      > I'm not sure then if we're actually executing each job twice or
>> just
>>      > duplicating lines in the jenkins report.
>>
>>     Thanks for catching that, and I'm sorry that didn't work right.  Zuul
>> is
>>     in fact running the jobs twice, but it is only looking at one of them
>>     when sending reports and (more importantly) decided whether the change
>>     has succeeded or failed.  Fixing this is possible, of course, but
>> turns
>>     out to be a rather complicated change.  Since we don't make heavy use
>> of
>>     this feature, I lean toward simply instantiating multiple instances of
>>     identically configured jobs and invoking them (eg "neutron-pg-1",
>>     "neutron-pg-2").
>>
>>     Matthew Treinish has already worked up a patch to do that, and I've
>>     written a patch to revert the incomplete feature from Zuul.
>>
>>
>> That makes sense to me. I think it is just a matter about the results
>> are reported to gerrit since from what I gather in logstash the jobs are
>> executed twice for each new patchset or recheck.
>>
>>
>> For the status of the full job, I gave a look at the numbers reported by
>> Rossella.
>> All the bugs are already known; some of them are not even bug; others
>> have been recently fixed (given the time span of Rossella analysis and
>> the fact it covers also non-rebased patches it might be possible to have
>> this kind of false positive).
>>
>> of all full job failures, 44% should be discarded.
>> Bug 1291611 (12%) is definitely not a neutron bug... hopefully.
>> Bug 1281969 (12%) is really too generic.
>> It bears the hallmark of bug1283522, and therefore the high number might
>> be due to the fact that trunk was plagued by this bug up to a few days
>> before the analysis.
>> However, it's worth noting that there is also another instance of "lock
>> timeout" which has caused 11 failures in full job in the past week.
>> A new bug has been filed for this issue:
>> https://bugs.launchpad.net/neutron/+bug/1298355
>> Bug 1294603 was related to a test now skipped. It is still being debated
>> whether the problem lies in test design, neutron LBaaS or neutron L3.
>>
>> The following bugs seem not to be neutron bugs:
>> 1290642, 1291920, 1252971, 1257885
>>
>> Bug 1292242 appears to have been fixed while the analysis was going on
>> Bug 1277439 instead is already known to affects neutron jobs occasionally.
>>
>> The actual state of the job is perhaps better than what the raw numbers
>> say. I would keep monitoring it, and then make it voting after the
>> Icehouse release is cut, so that we'll be able to deal with possible
>> higher failure rate in the "quiet" period of the release cycle.
>>
>>
>>
>>     -Jim
>>
>>     _______________________________________________
>>     OpenStack-dev mailing list
>>     OpenStack-dev at lists.openstack.org
>>     <mailto:OpenStack-dev at lists.openstack.org>
>>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> I reported this bug [1] yesterday.  This was hit in our internal Tempest
> runs on RHEL 6.5 with x86_64 and the nova libvirt driver with the neutron
> openvswitch ML2 driver.  We're running without tenant isolation on python
> 2.6 (no testr yet) so the tests are in serial.  We're running basically the
> full API/CLI/Scenarios tests though, no filtering on the smoke tag.
>
> Out of 1,971 tests run, we had 3 failures where a nova instance failed to
> spawn because networking callback events failed, i.e. neutron sends a
> server event request to nova and it's a bad URL so nova API pukes and then
> the networking request in neutron server fails.  As linked in the bug
> report I'm seeing the same neutron server log error showing up in logstash
> for community jobs but it's not 100% failure.  I haven't seen the n-api log
> error show up in logstash though.
>
> Just bringing this to people's attention in case anyone else sees it.
>
> [1] https://bugs.launchpad.net/nova/+bug/1298640
>
> --
>
> Thanks,
>
> Matt Riedemann
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140329/721fbcca/attachment.html>


More information about the OpenStack-dev mailing list