Hi,
This is the 2nd time I'm opening a thread about it. Sorry for this, but
it really bothers me.
Today, running tempest, I got these stats from stestr:
- Worker 0 (132 tests) => 0:24:49.209885
- Worker 1 (100 tests) => 0:17:14.791845
- Worker 2 (124 tests) => 0:42:52.690906
- Worker 3 (189 tests) => 0:41:21.307241
- Worker 4 (159 tests) => 0:45:49.503031
- Worker 5 (143 tests) => 0:28:13.282371
- Worker 6 (156 tests) => 3:16:52.364976
- Worker 7 (103 tests) => 0:46:05.366089
So, thread #1 ran 17 minutes, sitting idle for the rest of the 3 hours
run of thread #6. While I thought stestr was deviding the number of
tests by the number of thread, I don't get why worker #1 only had 100
tests assigned.
All together, all tests could have been ran within maybe less than an
hour of time (rather than 3h16 above), if not-yet-ran-tests were
reassigned to idle treads.
I'm currently spending a lot of time on running tempest. Indeed, I'm
running tempest on each OpenStack upgrade, from Victoria up to
Dalmatian. With the current way stestr run, it may take 2 full days to
do that (and that's not even counting when upgrade will fail and will
need fixing...), when it could be done in maybe 8 hours (if I count 1h
per OpenStack release).
So, knowing the above, it might be a good use of my time to dig into
stestr and see if I can fix this... or not!
Does anyone have a suggestion for another test runner, that's compatible
with stestr, at least for the tests selection with a regular expression?
As much as I know, pytest cannot take a regexp for test selection, can
it? Or is there maybe a plugin for it?
If there's no compatible test runner, where should I dig in the stestr
code to rewrite things in a smarter way?
Cheers,
Thomas Goirand (zigo)
P.S: On Caracal today, I just had this:
==============
Worker Balance
==============
- Worker 0 (117 tests) => 0:59:32.291385
- Worker 1 (147 tests) => 1:22:28.228980
- Worker 2 (125 tests) => 0:45:20.969397
- Worker 3 (114 tests) => 1:46:21.667579
- Worker 4 (170 tests) => 0:45:27.577738
- Worker 5 (190 tests) => 2:28:39.744920
- Worker 6 (182 tests) => 2:29:01.402255
- Worker 7 (152 tests) => 2:29:10.183359
this looks better, but that's probably 1/ random 2/ still not perfect,
with worker #0, #2 and #4 doing nothing 2/3rd of the time.