[openstack-dev] Migrating to testr parallel in tempest
mtreinish at kortar.org
Fri Aug 16 19:34:01 UTC 2013
On Fri, Aug 16, 2013 at 01:03:57PM -0500, Ben Nemec wrote:
> >>>Getting this in before the H3 rush would be very helpful. When we made
> >>>the switch with Nova's unittests we fixed as many of the test bugs
> >>>that we could find, merged the change to switch the test runner, then
> >>>treated all failures as very high priority bugs that received
> >>>immediate attention. Getting this in before H3 will give everyone a
> >>>little more time to debug any potential new issues exposed by Jenkins
> >>>or people running the tests locally.
> >>>I think we should be bold here and merge this as soon as we have good
> >>>numbers that indicate the trend is for these tests to pass. Graphite
> >>>can give us the pass to fail ratios over time, as long as these trends
> >>>are similar for both the old nosetest jobs and the new testr job I say
> >>>we go for it. (Disclaimer: most of the projecst I work on are not
> >>>affected by the tempest jobs; however, I am often called upon to help
> >>>sort out issues in the gate).
> >>I'm inclined to agree. It's not as if we don't have transient
> >>failures now, and if we're looking at a 50% speedup in
> >>recheck/verify times then as long as the new version isn't
> >>significantly less stable it should be a net improvement.
> >>Of course, without hard numbers we're kind of discussing in a vacuum
> >I also would like to get this in sooner rather than later and fix
> >the bugs as
> >they come in. But, I'm wary of doing this because there isn't a
> >proven success
> >history yet. No one likes gate resets, and I've only been running
> >it on the
> >gate queue for a day now.
> >So here is the graphite graph that I'm using to watch parallel vs
> >serial in the
> >gate queue:
> Okay, so what are the y-axis units on this? Because just guessing I
> would say that it's percentage of failing runs, in which case it
> looks like we're already within the 95% as accurate range (it never
> dips below -.05). Am I reading it right?
Yeah I'm not sure what scale it is using either. I'm not sure it's percent,
or if it is then it's not grouping things over a long period of time to
calculate the percentage. I just know by manually correlating with what
I saw by watching zuul is that -0.02 was one failure, -0.03 should be 2
This graph might be easier to read:
For this one I told graphite to do a total of events grouped at 1 hour
intervals. This time the y-axis is the number of runs. This plots the
differences between serial and parallel results. So as before, above 0 on the
y-axis means that many more jobs passed in that hour. I split out a line for
success, failure, and aborted.
The aborted number is actually pretty important. I noticed that if there is a
gate reset (or a bunch of them) when the queue is pretty deep the testr runs are
often finished before the job at the head of the queue fails. So they get marked
as failures but the full jobs never finish and get marked as aborted. The good
example of this is between late Aug 14 and early Aug 15 on the plot. That is when
when there was an intermittent test failure with horizon. Which was fixed by a
revert the next morning.
All this exercise has really shown me though is that graphing the results isn't
exactly straightforward or helpful unless everything we're measuring is gating.
So as things sit now we've found about ~5 more races and/or flaky tests while
running tempest in parallel. 2 have fixes in progress:
Then I have open bugs for the remaining 3 here:
I haven't seen any other repeating failures besides these 3, and no one else has
opened a bug regarding a parallel failure. (although I doubt anyone is paying
attention to the fails, I know I wouldn't :) ) So there may be more that are
happening more infrequently that are being hidden by these 3.
At this point I'm not sure it is ready yet with the frequency I've seen the
testr run fail. But, at the same time the longer we wait the more bugs that can
be introduced. Maybe there is some middle ground like marking the parallel job
as voting on the check queue.
> >On that graph the blue and yellow shows the number of jobs that
> >grouped together in per hour buckets. (yellow being parallel and
> >blue serial)
> >Then the red line is showing failures, a horizontal bar means that
> >there is no
> >difference in the number of failures between serial and parallel.
> >When it dips
> >negative it is showing a failure in parallel that wasn't on serial
> >a serial run
> >at the same time. When it goes positive it showing a failure on
> >serial that
> >doesn't occur on parallel at the same time. But, because the
> >serial runs take
> >longer the failures happen at an offset. So if the plot shows
> >parallel fails
> >followed closely by a serial failure than that is probably on the
> >same commit
> >and not a parallel specific issue.
> >Based on the results so far it looks like there is probably still
> >a race or 2
> >which would cause gate resets more than once in a day if we move
> >to parallel.
> >But, it's getting closer, what does everyone think?
> >My only concern is the time it takes me to track down these and
> >get the fixes
> >merged something new will pop up. For example, the last time it
> >got almost this
> >close a nova patch got merged that broke almost all the aggregates
> >tests in
> >tempest when running in parallel. Which prevented any run from
> >Matt Treinish
More information about the OpenStack-dev