[OpenStack-Infra] Fun (important!) project: optimize Gerrit's nova git repo

James E. Blair corvus at inaugust.com
Fri Jan 8 20:13:36 UTC 2016


Hi,

With the new version of Gerrit offering built-in "git gc" capability, we
looked at the current state of our git repo maintenance.  We run "git
repack -afd" weekly in an attempt to produce the smallest packfiles
possible, but it does not prune loose objects, which seems to be the
main thing "git gc" does that we are missing.

Some (relatively) quick experimentation suggests that various
combinations of "git gc", "git repack", "git prune", "git prune-packed"
all have effects on the overall repo size, the number of pack files, and
the number of loose objects.

However, we don't just want to find the thing that makes the smallest
repo size (that's easy: "git prune; git gc" -- 394M for nova; one
packfile with all objects and one packed-refs file with all refs)
because this repo is used as the basis of all of our mirrors and is
accessed over several protocols.  It's not immediately clear what the
right optimization is for our situation -- we don't necessarily want to
trade on-disk size for reduced network performance.  Even the packing of
refs isn't entirely straightforward -- while we haven't needed to for
some time, we have, in the past removed refs.

We're looking for a volunteer to really dig into this problem and
thoroughly evaluate the implications of different ways of optimizing the
repo.  If you're interested, you can download a snapshot of the full
nova repository from Gerrit (it is a point-in-time snapshot and will not
be updated) at this URL:

  http://tarballs.openstack.org/ci/nova.git.tar.bz2

Please follow up this message if you are interested and with any
findings.

Thanks,

Jim



More information about the OpenStack-Infra mailing list