[openstack-dev] Migrating to testr parallel in tempest
Ben Nemec
openstack at nemebean.com
Wed Aug 14 16:05:35 UTC 2013
On 2013-08-13 16:39, Clark Boylan wrote:
> On Tue, Aug 13, 2013 at 1:25 PM, Matthew Treinish
> <mtreinish at kortar.org> wrote:
>>
>> Hi everyone,
>>
>> So for the past month or so I've been working on getting tempest to
>> work stably
>> with testr in parallel. As part of this you may have noticed the
>> testr-full
>> jobs that get run on the zuul check queue. I was using that job to
>> debug some
>> of the more obvious race conditions and stability issues with running
>> tempest
>> in parallel. After a bunch of fixes to tempest and finding some real
>> bugs in
>> some of the projects things seem to have smoothed out.
>>
>> So I pushed the testr-full run to the gate queue earlier today. I'll
>> be keeping
>> track of the success rate of this job vs the serial job and use this
>> as the
>> determining factor before we push this live to be the default for all
>> tempest
>> runs. So assuming that the success rate matches up well enough with
>> serial job
>> on the gate queue then I will push out the change that will migrate
>> all the
>> voting jobs to run in parallel hopefully either Friday afternoon or
>> early next
>> week. Also, if anyone has any input on what threshold they feel is
>> good enough
>> for this I'd welcome any input on that. For example, do we want to
>> ensure
>> a >= 1:1 match for job success? Or would something like 90% as stable
>> as the
>> serial job be good enough considering the speed advantage. (The
>> parallel runs
>> take about half as much time as a full serial run, the parallel job
>> normally
>> finishes in ~25-30min) Since this affects almost every project I don't
>> want to
>> define this threshold without input from everyone.
>>
>> After there is some more data for the gate queue's parallel job I'll
>> have some
>> pretty graphite graphs that I can share comparing the success trends
>> between
>> the parallel and serial jobs.
>>
>> So at this point we're in the home stretch and I'm asking for
>> everyone's help
>> in getting this merged. So, if everyone who is reviewing and pushing
>> commits
>> could watch the results from these non-voting jobs and if things fail
>> on the
>> parallel job but not the serial job please investigate the failure and
>> open a
>> bug if necessary. If it turns out to be a bug in tempest please link
>> it against
>> this blueprint:
>>
>> https://blueprints.launchpad.net/tempest/+spec/speed-up-tempest
>>
>> so that I'll give it the attention it deserves. I'd hate to get this
>> close to
>> getting this merged and have a bit of racy code get merged at the last
>> second
>> and block us for another week or two.
>>
>> I feel that we need to get this in before the H3 rush starts up as it
>> will help
>> everyone get through the extra review load faster.
>>
> Getting this in before the H3 rush would be very helpful. When we made
> the switch with Nova's unittests we fixed as many of the test bugs
> that we could find, merged the change to switch the test runner, then
> treated all failures as very high priority bugs that received
> immediate attention. Getting this in before H3 will give everyone a
> little more time to debug any potential new issues exposed by Jenkins
> or people running the tests locally.
>
> I think we should be bold here and merge this as soon as we have good
> numbers that indicate the trend is for these tests to pass. Graphite
> can give us the pass to fail ratios over time, as long as these trends
> are similar for both the old nosetest jobs and the new testr job I say
> we go for it. (Disclaimer: most of the projecst I work on are not
> affected by the tempest jobs; however, I am often called upon to help
> sort out issues in the gate).
I'm inclined to agree. It's not as if we don't have transient failures
now, and if we're looking at a 50% speedup in recheck/verify times then
as long as the new version isn't significantly less stable it should be
a net improvement.
Of course, without hard numbers we're kind of discussing in a vacuum
here.
-Ben
More information about the OpenStack-dev
mailing list