Re: [all] Gate resources and performance

7 Feb 2021

      On 2021-02-07 09:28:35 +0200 (+0200), Dmitriy Rabotyagov wrote:
...
Once you said that, I looked through the actual code of the
prepare-workspace-git role more carefully and you're right - all
actions are made against already cached repos there. However since
it mostly uses commands, it would still be the way more efficient
to make up some module to replace all commands/shell to run things
in multiprocess way.   Regarding example, you can take any random
task from osa, ie [1] - it takes a bit more then 6 mins. When load
on providers is high (or their volume backend io is poor), time
increases
[...]
Okay, so that's these tasks:

https://opendev.org/zuul/zuul-jobs/src/commit/8bdb2b538c79dd75bac14180b905a1...
https://opendev.org/zuul/zuul-jobs/src/commit/8bdb2b538c79dd75bac14180b905a1...

It's doing a git clone from the cache on the node into the workspace
(in theory from one path to another within the same filesystem,
which should normally just result in git creating hardlinks to the
original objects/packs), and that took 101 seconds to clone 106
repositories. After that, 83 seconds were spent fixing up
configuration on each of those clones. The longest step does indeed
seem to be the 128 seconds where it pushed updated refs from the
cache on the executor over the network into the prepared workspace
on the remote build node.

I wonder if combining these into a single loop could help reduce the
iteration overhead, or whether processing repositories in parallel
would help (if they're limited by I/O bandwidth then I expect not)?
Regardless, yeah, 5m12s does seem like a good chunk of time. On the
other hand, it's worth keeping in mind that's just shy of 3 seconds
per required-project so like you say, it's mainly impacting jobs
with a massive number of required-projects. A different approach
might be to revisit the list of required-projects for that job and
check whether they're all actually used.
-- 
Jeremy Stanley