[openstack-dev] [qa] [neutron] Neutron Full Parallel job - Last 4 days failures
Eugene Nikanorov
enikanorov at mirantis.com
Sat Mar 29 16:36:11 UTC 2014
Bug 1294603 has the root cause in LBaaS, which should be fixed by
https://review.openstack.org/#/c/81537/
Thanks,
Eugene.
On Fri, Mar 28, 2014 at 7:29 PM, Matt Riedemann
<mriedem at linux.vnet.ibm.com>wrote:
>
>
> On 3/27/2014 8:00 AM, Salvatore Orlando wrote:
>
>>
>> On 26 March 2014 19:19, James E. Blair <jeblair at openstack.org
>> <mailto:jeblair at openstack.org>> wrote:
>>
>> Salvatore Orlando <sorlando at nicira.com <mailto:sorlando at nicira.com>>
>>
>> writes:
>>
>> > On another note, we noticed that the duplicated jobs currently
>> executed for
>> > redundancy in neutron actually seem to point all to the same
>> build id.
>> > I'm not sure then if we're actually executing each job twice or
>> just
>> > duplicating lines in the jenkins report.
>>
>> Thanks for catching that, and I'm sorry that didn't work right. Zuul
>> is
>> in fact running the jobs twice, but it is only looking at one of them
>> when sending reports and (more importantly) decided whether the change
>> has succeeded or failed. Fixing this is possible, of course, but
>> turns
>> out to be a rather complicated change. Since we don't make heavy use
>> of
>> this feature, I lean toward simply instantiating multiple instances of
>> identically configured jobs and invoking them (eg "neutron-pg-1",
>> "neutron-pg-2").
>>
>> Matthew Treinish has already worked up a patch to do that, and I've
>> written a patch to revert the incomplete feature from Zuul.
>>
>>
>> That makes sense to me. I think it is just a matter about the results
>> are reported to gerrit since from what I gather in logstash the jobs are
>> executed twice for each new patchset or recheck.
>>
>>
>> For the status of the full job, I gave a look at the numbers reported by
>> Rossella.
>> All the bugs are already known; some of them are not even bug; others
>> have been recently fixed (given the time span of Rossella analysis and
>> the fact it covers also non-rebased patches it might be possible to have
>> this kind of false positive).
>>
>> of all full job failures, 44% should be discarded.
>> Bug 1291611 (12%) is definitely not a neutron bug... hopefully.
>> Bug 1281969 (12%) is really too generic.
>> It bears the hallmark of bug1283522, and therefore the high number might
>> be due to the fact that trunk was plagued by this bug up to a few days
>> before the analysis.
>> However, it's worth noting that there is also another instance of "lock
>> timeout" which has caused 11 failures in full job in the past week.
>> A new bug has been filed for this issue:
>> https://bugs.launchpad.net/neutron/+bug/1298355
>> Bug 1294603 was related to a test now skipped. It is still being debated
>> whether the problem lies in test design, neutron LBaaS or neutron L3.
>>
>> The following bugs seem not to be neutron bugs:
>> 1290642, 1291920, 1252971, 1257885
>>
>> Bug 1292242 appears to have been fixed while the analysis was going on
>> Bug 1277439 instead is already known to affects neutron jobs occasionally.
>>
>> The actual state of the job is perhaps better than what the raw numbers
>> say. I would keep monitoring it, and then make it voting after the
>> Icehouse release is cut, so that we'll be able to deal with possible
>> higher failure rate in the "quiet" period of the release cycle.
>>
>>
>>
>> -Jim
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> <mailto:OpenStack-dev at lists.openstack.org>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> I reported this bug [1] yesterday. This was hit in our internal Tempest
> runs on RHEL 6.5 with x86_64 and the nova libvirt driver with the neutron
> openvswitch ML2 driver. We're running without tenant isolation on python
> 2.6 (no testr yet) so the tests are in serial. We're running basically the
> full API/CLI/Scenarios tests though, no filtering on the smoke tag.
>
> Out of 1,971 tests run, we had 3 failures where a nova instance failed to
> spawn because networking callback events failed, i.e. neutron sends a
> server event request to nova and it's a bad URL so nova API pukes and then
> the networking request in neutron server fails. As linked in the bug
> report I'm seeing the same neutron server log error showing up in logstash
> for community jobs but it's not 100% failure. I haven't seen the n-api log
> error show up in logstash though.
>
> Just bringing this to people's attention in case anyone else sees it.
>
> [1] https://bugs.launchpad.net/nova/+bug/1298640
>
> --
>
> Thanks,
>
> Matt Riedemann
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140329/721fbcca/attachment.html>
More information about the OpenStack-dev
mailing list