[openstack-dev] Migrating to testr parallel in tempest

Dan Prince dprince at redhat.com
Wed Aug 14 20:23:59 UTC 2013



----- Original Message -----
> From: "Matthew Treinish" <mtreinish at kortar.org>
> To: openstack-dev at lists.openstack.org
> Sent: Tuesday, August 13, 2013 4:25:14 PM
> Subject: [openstack-dev] Migrating to testr parallel in tempest
> 
> 
> Hi everyone,
> 
> So for the past month or so I've been working on getting tempest to work
> stably
> with testr in parallel. As part of this you may have noticed the testr-full
> jobs that get run on the zuul check queue. I was using that job to debug some
> of the more obvious race conditions and stability issues with running tempest
> in parallel. After a bunch of fixes to tempest and finding some real bugs in
> some of the projects things seem to have smoothed out.
> 
> So I pushed the testr-full run to the gate queue earlier today. I'll be
> keeping
> track of the success rate of this job vs the serial job and use this as the
> determining factor before we push this live to be the default for all tempest
> runs. So assuming that the success rate matches up well enough with serial
> job
> on the gate queue then I will push out the change that will migrate all the
> voting jobs to run in parallel hopefully either Friday afternoon or early
> next
> week. Also, if anyone has any input on what threshold they feel is good
> enough
> for this I'd welcome any input on that. For example, do we want to ensure
> a >= 1:1 match for job success? Or would something like 90% as stable as the
> serial job be good enough considering the speed advantage. (The parallel runs
> take about half as much time as a full serial run, the parallel job normally
> finishes in ~25-30min) Since this affects almost every project I don't want
> to
> define this threshold without input from everyone.

Nice work on the speedups!

Regarding the stability... Having tests which fail due for "stability reasons" is concerning especially if it is as high as 10%. I personally get frustrated even at the 1% level.

Lets see where the numbers fall. If we try it out we can always switch back if it is causing too many rechecks or failed gates right?

If it does work great. If not I might prefer to see us keep a leaner set of gating Tempest tests so that runtime stays down in serial mode while we work bugs out of parallel mode.

Dan


> 
> After there is some more data for the gate queue's parallel job I'll have
> some
> pretty graphite graphs that I can share comparing the success trends between
> the parallel and serial jobs.
> 
> So at this point we're in the home stretch and I'm asking for everyone's help
> in getting this merged. So, if everyone who is reviewing and pushing commits
> could watch the results from these non-voting jobs and if things fail on the
> parallel job but not the serial job please investigate the failure and open a
> bug if necessary. If it turns out to be a bug in tempest please link it
> against
> this blueprint:
> 
> https://blueprints.launchpad.net/tempest/+spec/speed-up-tempest
> 
> so that I'll give it the attention it deserves. I'd hate to get this close to
> getting this merged and have a bit of racy code get merged at the last second
> and block us for another week or two.
> 
> I feel that we need to get this in before the H3 rush starts up as it will
> help
> everyone get through the extra review load faster.
> 
> Thanks,
> 
> Matt Treinish
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list