These jobs seem to timeout from every provider on the regular[1], but the issue is surely more apparent with tempest on FN. The result is quite a bit of lost time. 361 jobs that run for several hours results in a little over a 1000 hours of lost cycles.
On Fri, Jul 26, 2019 at 04:53:28PM -0700, Clark Boylan wrote:
> Given my change shows this can be so much quicker is there any
> interest in modifying devstack to be faster here? And if so what do
> we think an appropriate approach would be?
My first concern was if anyone considered openstack-client setting
these things up as actually part of the testing. I'd say not,
comments in [1] suggest similar views.
My second concern is that we do keep sufficient track of complexity v
speed; obviously doing things in a sequential manner via a script is
pretty simple to follow and as we start putting things into scripts we
make it harder to debug when a monoscript dies and you have to start
pulling apart where it was. With just a little json fiddling we can
currently pull good stats from logstash ([2]) so I think as we go it
would be good to make sure we account for the time using appropriate
wrappers, etc.
Then the third concern is not to break anything for plugins --
devstack has a very very loose API which basically relies on plugin
authors using a combination of good taste and copying other code to
decide what's internal or not.
Which made me start thinking I wonder if we look at this closely, even
without replacing things we might make inroads?
For example [3]; it seems like SERVICE_DOMAIN_NAME is never not
default, so the get_or_create_domain call is always just overhead (the
result is never used).
Then it seems that in the gate, basically all of the "get_or_create"
calls will really just be "create" calls? Because we're always
starting fresh. So we could cut out about half of the calls there
pre-checking if we know we're under zuul (proof-of-concept [4]).
Then we have blocks like:
get_or_add_user_project_role $member_role $demo_user $demo_project
get_or_add_user_project_role $admin_role $admin_user $demo_project
get_or_add_user_project_role $another_role $demo_user $demo_project
get_or_add_user_project_role $member_role $demo_user $invis_project
If we wrapped that in something like
start_osc_session
...
end_osc_session
which sets a variable that means instead of calling directly, those
functions write their arguments to a tmp file. Then at the end call,
end_osc_session does
$ osc "$(< tmpfile)"
and uses the inbuilt batching? If that had half the calls by skipping
the "get_or" bit, and used common authentication from batching, would
that help?
And then I don't know if all the projects and groups are required for
every devstack run? Maybe someone skilled in the art could do a bit
of an audit and we could cut more of that out too?
So I guess my point is that maybe we could tweak what we have a bit to
make some immediate wins, before anyone has to rewrite too much?
-i
[1] https://review.opendev.org/673018
[2] https://ethercalc.openstack.org/rzuhevxz7793
[3] https://review.opendev.org/673941
[4] https://review.opendev.org/673936