Improving stestr... or switching to a (better) compatible test runner?

30 Sep 2024

      Hi,

This is the 2nd time I'm opening a thread about it. Sorry for this, but 
it really bothers me.

Today, running tempest, I got these stats from stestr:

  - Worker 0 (132 tests) => 0:24:49.209885
  - Worker 1 (100 tests) => 0:17:14.791845
  - Worker 2 (124 tests) => 0:42:52.690906
  - Worker 3 (189 tests) => 0:41:21.307241
  - Worker 4 (159 tests) => 0:45:49.503031
  - Worker 5 (143 tests) => 0:28:13.282371
  - Worker 6 (156 tests) => 3:16:52.364976
  - Worker 7 (103 tests) => 0:46:05.366089

So, thread #1 ran 17 minutes, sitting idle for the rest of the 3 hours 
run of thread #6. While I thought stestr was deviding the number of 
tests by the number of thread, I don't get why worker #1 only had 100 
tests assigned.

All together, all tests could have been ran within maybe less than an 
hour of time (rather than 3h16 above), if not-yet-ran-tests were 
reassigned to idle treads.

I'm currently spending a lot of time on running tempest. Indeed, I'm 
running tempest on each OpenStack upgrade, from Victoria up to 
Dalmatian. With the current way stestr run, it may take 2 full days to 
do that (and that's not even counting when upgrade will fail and will 
need fixing...), when it could be done in maybe 8 hours (if I count 1h 
per OpenStack release).

So, knowing the above, it might be a good use of my time to dig into 
stestr and see if I can fix this... or not!

Does anyone have a suggestion for another test runner, that's compatible 
with stestr, at least for the tests selection with a regular expression? 
As much as I know, pytest cannot take a regexp for test selection, can 
it? Or is there maybe a plugin for it?

If there's no compatible test runner, where should I dig in the stestr 
code to rewrite things in a smarter way?

Cheers,

Thomas Goirand (zigo)

P.S: On Caracal today, I just had this:

==============
Worker Balance
==============
  - Worker 0 (117 tests) => 0:59:32.291385
  - Worker 1 (147 tests) => 1:22:28.228980
  - Worker 2 (125 tests) => 0:45:20.969397
  - Worker 3 (114 tests) => 1:46:21.667579
  - Worker 4 (170 tests) => 0:45:27.577738
  - Worker 5 (190 tests) => 2:28:39.744920
  - Worker 6 (182 tests) => 2:29:01.402255
  - Worker 7 (152 tests) => 2:29:10.183359

this looks better, but that's probably 1/ random 2/ still not perfect, 
with worker #0, #2 and #4 doing nothing 2/3rd of the time.