[openstack-dev] [qa] [neutron] Neutron Full Parallel job - Last 4 days failures

Matt Riedemann mriedem at linux.vnet.ibm.com
Fri Mar 28 15:29:12 UTC 2014



On 3/27/2014 8:00 AM, Salvatore Orlando wrote:
>
> On 26 March 2014 19:19, James E. Blair <jeblair at openstack.org
> <mailto:jeblair at openstack.org>> wrote:
>
>     Salvatore Orlando <sorlando at nicira.com <mailto:sorlando at nicira.com>>
>     writes:
>
>      > On another note, we noticed that the duplicated jobs currently
>     executed for
>      > redundancy in neutron actually seem to point all to the same
>     build id.
>      > I'm not sure then if we're actually executing each job twice or just
>      > duplicating lines in the jenkins report.
>
>     Thanks for catching that, and I'm sorry that didn't work right.  Zuul is
>     in fact running the jobs twice, but it is only looking at one of them
>     when sending reports and (more importantly) decided whether the change
>     has succeeded or failed.  Fixing this is possible, of course, but turns
>     out to be a rather complicated change.  Since we don't make heavy use of
>     this feature, I lean toward simply instantiating multiple instances of
>     identically configured jobs and invoking them (eg "neutron-pg-1",
>     "neutron-pg-2").
>
>     Matthew Treinish has already worked up a patch to do that, and I've
>     written a patch to revert the incomplete feature from Zuul.
>
>
> That makes sense to me. I think it is just a matter about the results
> are reported to gerrit since from what I gather in logstash the jobs are
> executed twice for each new patchset or recheck.
>
>
> For the status of the full job, I gave a look at the numbers reported by
> Rossella.
> All the bugs are already known; some of them are not even bug; others
> have been recently fixed (given the time span of Rossella analysis and
> the fact it covers also non-rebased patches it might be possible to have
> this kind of false positive).
>
> of all full job failures, 44% should be discarded.
> Bug 1291611 (12%) is definitely not a neutron bug... hopefully.
> Bug 1281969 (12%) is really too generic.
> It bears the hallmark of bug1283522, and therefore the high number might
> be due to the fact that trunk was plagued by this bug up to a few days
> before the analysis.
> However, it's worth noting that there is also another instance of "lock
> timeout" which has caused 11 failures in full job in the past week.
> A new bug has been filed for this issue:
> https://bugs.launchpad.net/neutron/+bug/1298355
> Bug 1294603 was related to a test now skipped. It is still being debated
> whether the problem lies in test design, neutron LBaaS or neutron L3.
>
> The following bugs seem not to be neutron bugs:
> 1290642, 1291920, 1252971, 1257885
>
> Bug 1292242 appears to have been fixed while the analysis was going on
> Bug 1277439 instead is already known to affects neutron jobs occasionally.
>
> The actual state of the job is perhaps better than what the raw numbers
> say. I would keep monitoring it, and then make it voting after the
> Icehouse release is cut, so that we'll be able to deal with possible
> higher failure rate in the "quiet" period of the release cycle.
>
>
>
>     -Jim
>
>     _______________________________________________
>     OpenStack-dev mailing list
>     OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

I reported this bug [1] yesterday.  This was hit in our internal Tempest 
runs on RHEL 6.5 with x86_64 and the nova libvirt driver with the 
neutron openvswitch ML2 driver.  We're running without tenant isolation 
on python 2.6 (no testr yet) so the tests are in serial.  We're running 
basically the full API/CLI/Scenarios tests though, no filtering on the 
smoke tag.

Out of 1,971 tests run, we had 3 failures where a nova instance failed 
to spawn because networking callback events failed, i.e. neutron sends a 
server event request to nova and it's a bad URL so nova API pukes and 
then the networking request in neutron server fails.  As linked in the 
bug report I'm seeing the same neutron server log error showing up in 
logstash for community jobs but it's not 100% failure.  I haven't seen 
the n-api log error show up in logstash though.

Just bringing this to people's attention in case anyone else sees it.

[1] https://bugs.launchpad.net/nova/+bug/1298640

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list