[openstack-dev] [gate] large-ops failure spike

Matthew Treinish mtreinish at kortar.org
Wed Jan 20 15:03:20 UTC 2016

On Wed, Jan 20, 2016 at 07:45:16AM -0500, Sean Dague wrote:
> The large-ops jobs jumped to a 50% fail in check, 25% fail in gate in
> the last 24 hours.
> http://tinyurl.com/j5u4nf5
> There isn't an obvious culprit at this point. I spent some time this

There is a very obvious culprit, pip 8 was released last night. [1][2] Every
dsvm job was failing between the release and when the fixes [3][4][5] landed
will have a spike like this. That graph has a 12 hour rolling average and the
fixes landed less than 12 hours ago.

> morning digging into it a bit. Possibly each individual instance build
> got slower, possibly some other timeout is getting hit.
> The large-ops jobs were largely maintained by Joe Gordon, who dug into
> them when there were issues. He's not part of the community any more,
> and I don't think there is currently a point person.

I think you're conflating adding the jobs with maintaining them. Joe did
initially add the jobs but he wasn't an active a maintainer as you're implying
here. Well, no more so than he was for any other dsvm failure. Not having him
around to help with failures anymore is an issue for all jobs not just the ones
he added.

> With no current maintainer, I'd suggest we make the jobs non voting -
> https://review.openstack.org/#/c/270141/

I'm -1 on this, we really don't want to remove jobs like this until we have
equivalent coverage setup somewhere. Frankly there should just be a nova
functional test that load similar testing with the fake virt driver. But, until
that's done I think premature to make these non-voting.

> I also suggest their time has probably come and gone. There is no one
> active on them, and the Rally team is.
> A pre-gating test job is only useful if someone is actively addressing
> systematic fails. This job class no longer has it. We should thus retire it.

While I agree with the sentiment I don't think this actually applies in
practice, the idea of a formal maintainer for a job is kinda a pipe dream. Look
at: http://status.openstack.org/elastic-recheck/data/uncategorized.html and
identify the maintainers for all the jobs listed there and ask why they have
uncategorized failures. Are you saying we should retire all those jobs because
there isn't anyone signed up (in the non-existent registry of job maintainers)
to watch the failures?

-Matt Treinish

[1] http://lists.openstack.org/pipermail/openstack-dev/2016-January/084475.html
[2] https://github.com/pypa/pip/issues/3384
[3] https://review.openstack.org/269954
[4] https://review.openstack.org/269970
[5] https://review.openstack.org/269969
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160120/049276f0/attachment.pgp>

More information about the OpenStack-dev mailing list