[openstack-dev] [qa] [neutron] Neutron Full Parallel job - Last 4 days failures

Salvatore Orlando sorlando at nicira.com
Thu Mar 27 13:00:40 UTC 2014


On 26 March 2014 19:19, James E. Blair <jeblair at openstack.org> wrote:

> Salvatore Orlando <sorlando at nicira.com> writes:
>
> > On another note, we noticed that the duplicated jobs currently executed
> for
> > redundancy in neutron actually seem to point all to the same build id.
> > I'm not sure then if we're actually executing each job twice or just
> > duplicating lines in the jenkins report.
>
> Thanks for catching that, and I'm sorry that didn't work right.  Zuul is
> in fact running the jobs twice, but it is only looking at one of them
> when sending reports and (more importantly) decided whether the change
> has succeeded or failed.  Fixing this is possible, of course, but turns
> out to be a rather complicated change.  Since we don't make heavy use of
> this feature, I lean toward simply instantiating multiple instances of
> identically configured jobs and invoking them (eg "neutron-pg-1",
> "neutron-pg-2").
>
> Matthew Treinish has already worked up a patch to do that, and I've
> written a patch to revert the incomplete feature from Zuul.
>

That makes sense to me. I think it is just a matter about the results are
reported to gerrit since from what I gather in logstash the jobs are
executed twice for each new patchset or recheck.


For the status of the full job, I gave a look at the numbers reported by
Rossella.
All the bugs are already known; some of them are not even bug; others have
been recently fixed (given the time span of Rossella analysis and the fact
it covers also non-rebased patches it might be possible to have this kind
of false positive).

of all full job failures, 44% should be discarded.
Bug 1291611 (12%) is definitely not a neutron bug... hopefully.
Bug 1281969 (12%) is really too generic.
It bears the hallmark of bug1283522, and therefore the high number might be
due to the fact that trunk was plagued by this bug up to a few days before
the analysis.
However, it's worth noting that there is also another instance of "lock
timeout" which has caused 11 failures in full job in the past week.
A new bug has been filed for this issue:
https://bugs.launchpad.net/neutron/+bug/1298355
Bug 1294603 was related to a test now skipped. It is still being debated
whether the problem lies in test design, neutron LBaaS or neutron L3.

The following bugs seem not to be neutron bugs:
1290642, 1291920, 1252971, 1257885

Bug 1292242 appears to have been fixed while the analysis was going on
Bug 1277439 instead is already known to affects neutron jobs occasionally.

The actual state of the job is perhaps better than what the raw numbers
say. I would keep monitoring it, and then make it voting after the Icehouse
release is cut, so that we'll be able to deal with possible higher failure
rate in the "quiet" period of the release cycle.



> -Jim
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140327/c3bbabe2/attachment.html>


More information about the OpenStack-dev mailing list