[openstack-dev] Migrating to testr parallel in tempest

Ben Nemec openstack at nemebean.com
Wed Aug 14 16:05:35 UTC 2013


On 2013-08-13 16:39, Clark Boylan wrote:
> On Tue, Aug 13, 2013 at 1:25 PM, Matthew Treinish 
> <mtreinish at kortar.org> wrote:
>> 
>> Hi everyone,
>> 
>> So for the past month or so I've been working on getting tempest to 
>> work stably
>> with testr in parallel. As part of this you may have noticed the 
>> testr-full
>> jobs that get run on the zuul check queue. I was using that job to 
>> debug some
>> of the more obvious race conditions and stability issues with running 
>> tempest
>> in parallel. After a bunch of fixes to tempest and finding some real 
>> bugs in
>> some of the projects things seem to have smoothed out.
>> 
>> So I pushed the testr-full run to the gate queue earlier today. I'll 
>> be keeping
>> track of the success rate of this job vs the serial job and use this 
>> as the
>> determining factor before we push this live to be the default for all 
>> tempest
>> runs. So assuming that the success rate matches up well enough with 
>> serial job
>> on the gate queue then I will push out the change that will migrate 
>> all the
>> voting jobs to run in parallel hopefully either Friday afternoon or 
>> early next
>> week. Also, if anyone has any input on what threshold they feel is 
>> good enough
>> for this I'd welcome any input on that. For example, do we want to 
>> ensure
>> a >= 1:1 match for job success? Or would something like 90% as stable 
>> as the
>> serial job be good enough considering the speed advantage. (The 
>> parallel runs
>> take about half as much time as a full serial run, the parallel job 
>> normally
>> finishes in ~25-30min) Since this affects almost every project I don't 
>> want to
>> define this threshold without input from everyone.
>> 
>> After there is some more data for the gate queue's parallel job I'll 
>> have some
>> pretty graphite graphs that I can share comparing the success trends 
>> between
>> the parallel and serial jobs.
>> 
>> So at this point we're in the home stretch and I'm asking for 
>> everyone's help
>> in getting this merged. So, if everyone who is reviewing and pushing 
>> commits
>> could watch the results from these non-voting jobs and if things fail 
>> on the
>> parallel job but not the serial job please investigate the failure and 
>> open a
>> bug if necessary. If it turns out to be a bug in tempest please link 
>> it against
>> this blueprint:
>> 
>> https://blueprints.launchpad.net/tempest/+spec/speed-up-tempest
>> 
>> so that I'll give it the attention it deserves. I'd hate to get this 
>> close to
>> getting this merged and have a bit of racy code get merged at the last 
>> second
>> and block us for another week or two.
>> 
>> I feel that we need to get this in before the H3 rush starts up as it 
>> will help
>> everyone get through the extra review load faster.
>> 
> Getting this in before the H3 rush would be very helpful. When we made
> the switch with Nova's unittests we fixed as many of the test bugs
> that we could find, merged the change to switch the test runner, then
> treated all failures as very high priority bugs that received
> immediate attention. Getting this in before H3 will give everyone a
> little more time to debug any potential new issues exposed by Jenkins
> or people running the tests locally.
> 
> I think we should be bold here and merge this as soon as we have good
> numbers that indicate the trend is for these tests to pass. Graphite
> can give us the pass to fail ratios over time, as long as these trends
> are similar for both the old nosetest jobs and the new testr job I say
> we go for it. (Disclaimer: most of the projecst I work on are not
> affected by the tempest jobs; however, I am often called upon to help
> sort out issues in the gate).

I'm inclined to agree.  It's not as if we don't have transient failures 
now, and if we're looking at a 50% speedup in recheck/verify times then 
as long as the new version isn't significantly less stable it should be 
a net improvement.

Of course, without hard numbers we're kind of discussing in a vacuum 
here.

-Ben



More information about the OpenStack-dev mailing list