[all] Gate resources and performance

Dan Smith dms at danplanet.com
Fri Feb 5 14:48:05 UTC 2021


> just wanted to point out the 'node hours' comparison may not be fair
> because what is a typical nova patch or a typical tripleo patch?  The
> number of jobs matched & executed by zuul on a given review will be
> different to another tripleo patch in the same repo depending on the
> files touched or branch (etc.) and will vary even more compared to
> other tripleo repos; I think this is the same for nova or any other
> project with multiple repos.

It is indeed important to note that some projects may have wildly
different numbers depending on what is touched in the patch. Speaking
from experience with Nova, Glance, and QA, most job runs are going to be
the same for anything that touches code. Nova will only run unit or
functional tests if those are the only files you touched, or docs if so,
but otherwise we're pretty much running everything all the time, AFAIK.

That could be an area for improvement for us, although I think that
determining the scope by the file changed is hard for us just because
of how intertwined things are, so we probably need to figure out how to
target our tests another way. And basically all of Nova is in a single
repo. But yes, totally fair point. I picked a couple test runs at random
to generate these numbers, based on looking like they were running
most/all of what is configured. First time I did that I picked a stable
Neutron patch from before they dropped some testing and got a sky-high
number of 54h for a single patch run. So clearly it can vary :)

> ACK. We have recently completed some work (as I said, this is an
> ongoing issue/process for us) at [1][2] to remove some redundant jobs
> which should start to help. Mohamed (mnaser o/) has reached out about
> this and joined our most recent irc meeting [3]. We're already
> prioritized some more cleanup work for this sprint including checking
> file patterns (e.g. started at [4]), tempest tests and removing
> many/all of our non-voting jobs as a first pass. Hope that at least
> starts to address you concern,

Yep, and thanks a lot for what you've done and continue to do. Obviously
looking at the "tripleo is ~40%" report, I expected my script to show
tripleo as having some insanely high test load. Looking at the actual
numbers, it's clear that you're not only not the heaviest, but given
what we know to be a super heavy process of deploying nodes like you do,
seemingly relatively efficient. I'm sure there's still improvement that
could be made on top of your current list, but I think the lesson in
these numbers is that we definitely need to look elsewhere than the
traditional openstack pastime of blaming tripleo ;)

For my part so far, I've got a stack of patches proposed to make
devstack run quite a bit faster for jobs that use it:

https://review.opendev.org/q/topic:%2522async%2522+status:open+project:openstack/devstack

and I've also proposed that nova stop running two grenades which almost
100% overlap (which strangely has to be a change in the tempest repo):

https://review.opendev.org/c/openstack/tempest/+/771499

Both of these have barriers to approval at the moment, but both have big
multipliers capable of making a difference.

--Dan



More information about the openstack-discuss mailing list