[OpenStack-Infra] Fun (important!) project: optimize Gerrit's nova git repo

Zaro zaro0508 at gmail.com
Fri Jun 10 19:20:01 UTC 2016

Hello All.

My previous testing on this was sorta bogus because I was cloning with
a file reference to the repo which isn't a use case that Openstack
supports.  It's also why I saw such a significant clone performance
difference between the a GCed repo and a non-GCed repo.

This is a redo and this time I've tested the use cases that infra does
support.  Just about all of our community (bots and people) clone from
our git mirrors (git.openstack.org) and not directly from our Gerrit
server (review.o.o).  Thus it's much more realistic to verify git
performance from our git mirrors rather than from review.o.o.  This
next set of test result attempts to simulate the performance of the
nova repo cloned from git.o.o.  Since git.o.o allows git interactions
using a few different protocols (git, http smart, and http dumb) for I
have attempted to test cloning using each protocol.

Test Environment:
I setup a test CentOS 7 VM server (1 VCPU, 10 G RAM) to host two nova
repros, one repo was not GCed (nova-nogc) and the second repro was
GCed (nova-gc).  The GC was done using the C git client (`git gc`)
packaged with CentOS.  Both repos can be cloned using either git, http
smart or http dumb protocols.  I cloned the repos directly on the host
machine for my tests.


repo | protocol | average clone time (5 runs) | disk consumption after
clone | ram usage
nova-gc      | http dumb      | 2m 33 sec  | 409M | 1% ~200M
nova-nogc  | http dumb      | 2m 33 sec  | 409M | 1% ~200M
nova-gc      | http smart      | 3m 5 sec    | 147M | 4% ~500M
nova-nogc  | http smart      | 3m 15 sec  | 147M | 4% ~500M
nova-gc      | git                  | 3m 4 sec    | 147M | 4% ~500M
nova-nogc  | git                  | 3m 12 sec  | 147M | 4% ~500M

The conclusion I draw from the test result is that there should really
be no performance difference between cloning a nova repo as-is (`git
repack -afd`) vs a nova repo that has gone thru a garbage collection
(`git gc`).  The difference is that we would save a significant amount
of disk space on the servers (7G for nova-nogc vs 400M for nova-gc).
I guess garbage collection is all about reducing repo size but does
not really do anything to help increase git performance.  The only
realized performance gain I see is that smaller repos would probably
speed up Gerrit replication to all our git slaves.


More information about the OpenStack-Infra mailing list