Re: Making stestr scheduler smarter with a lot of threads

14 Jul 2024

      On Tue, Jul 09, 2024 at 07:49:43AM -0700, Clark Boylan wrote:
...
On Tue, Jul 9, 2024, at 1:37 AM, smooney@redhat.com wrote:
...
...
Hi,
Currently, it looks like stestr calculate which thread should run which 
test at the beginning, by calculating some partition, and then launch 
all tests at once. With a lot of threads, the result is, at the end, 
only a few cores are in use, with all the others being idle.
currently unless you overried it my understanidn is the deistrubtion of 
tests is done on the class level
in a round robbin maner acroos the worker threads.
as you said this is done prior to lauching the worker thread staticly
On Tue, 2024-07-09 at 04:38 +0200, Thomas Goirand wrote:
pre start up
by generating a file with the relevent distibution.
The docs [0] say this "Currently the partitioning algorithm is simple round-robin for tests that stestr has not seen run before, and equal-time buckets for tests that stestr has seen run." which maintains the old testr behavior. Basically if there is run information in the stestr database it should use that to bucket tests more evenly. Are you seeing this behavior in a fresh checkout or with existing data in your database? One option for CI would be to record historical runs and preseed the database in fresh checkouts with that information.
...
...
Would there be a way to have a pool of available threads instead, and 
have stestr to give threads something to eat when they are available, 
instead of the current way?
i think based on how this currnelty works that would require use to 
repealty
spawn thread and genrate  new workers after every class. effectivly
pregenerate a set of task files at start up and when one thread 
complete grab the next file
an lauch a new thread.
if you actully wanted to do this with a thread pool and dispatch tests 
into that i think
i would be a larger rewrite.
...
How much work would that be?
im not very familar with the workings of this although i have had to 
debug it once or twice
a few years ago due to gate issues but im not conviced this would be 
that easy to do in a more dynmic way
you could likely hack toghete ther appoch of generateing may test list 
files and spawning thread up to n
wiht less work then properly usign a thread pool and quing the test 
units but i suspect both would be
more then a couple of hours work but i dont know if thats days or weeks.
it likely depens on how familar people are with stestr, unfortuenlly 
there are not many that are.
...
This isn't actually a new idea, it's something we initially discussed adding
to stestr for Tempest and Nova unittests like 6-7 years ago.

I actually have a WIP pull request implementing this as an experimental
feature from ~5 years ago here:

https://github.com/mtreinish/stestr/pull/271

At the time I got it working on Linux and macOS, but was struggling to
get the IPC for result streaming working correctly on Windows. I was eventually
going to make the option for POSIX compatible platforms only to side step this
initially, but I never circled back to it because there wasn't a huge demand for
the feature and I got distracted by other things. I just updated the branch and
it looks like in the intervening time things have bit-rotted a bit and it's not
working at all anymore.

But, Clark's analysis is correct, and typically if you have a historical run in
the stestr database the historical timing based partitioning strategy does a good
enough job with worker balance that this isn't a problem most of the time (which
is why the demand for #271 hasn't been so high).

To take advantage of this in CI the trick a lot of people do in is cache the
subunit result stream from the run and before running the tests you call
`stestr load` on that cached result stream to populate the local database with
historical data. Back in the day we used to do this with the subunit2sql
database for Openstack, but since that's all been retired I'm not sure what the
current status of any of this configuration is.

-Matt Treinish
...
[0] https://stestr.readthedocs.io/en/latest/MANUAL.html#parallel-testing
[1] https://stestr.readthedocs.io/en/stable/MANUAL.html#load

Re: Making stestr scheduler smarter with a lot of threads

Matthew Treinish