[OpenStack-Infra] Fun (important!) project: optimize Gerrit's nova git repo
Zaro
zaro0508 at gmail.com
Sat Mar 26 00:47:39 UTC 2016
So I've been researching this and I've found that there is a
significant performance improvement after running git gc on this nova
repro. Below are my results.
File sizes of repo as-is:
~/nova.git.orig$ du -hsx * | sort -r | head -10
6.4G objects
6.1M info
4.0K config
4.0K HEAD
382M refs
2.1M logs
0B hooks
0B description
0B branches
Note that the repro as-is has already been thru a 'git repack -afd'.
File sizes after running 'jgit gc':
~/nova.git.test$ du -hsx * | sort -r | head -10
6.1M packed-refs
6.1M info
420M objects
4.0K config
4.0K HEAD
2.1M logs
0B refs
0B hooks
0B description
0B branches
The result is that the gc cleans up the objects (6.4G -> 420M) and
moves the loose ref objects from 'refs' dir to a 'packed-refs' file
(382M -> 6.1M).
Note that I'm using jgit because that's what Gerrit would use to do
the 'gc'. The jgit version is 4.0.1.201506240215-r which is the one
that's packaged with our current version of Gerrit
(2.11.4-11-ga14450f) on review.o.o
Here I've tested the performance of the git clone, fetch and push
before and after running 'jgit gc':
`git clone`
------------
before:
real 3m30.163s
user 0m2.020s
sys 3m15.087s
after:
real 0m0.925s
user 0m0.406s
sys 0m0.621s
`git fetch origin stable/liberty`
---------------------------------
before:
real 0m4.271s
user 0m0.701s
sys 0m2.949s
after:
real 0m0.686s
user 0m0.348s
sys 0m0.307s
`git push origin HEAD:refs/for/master`
--------------------------------------
before:
real 0m36.454s
user 0m5.346s
sys 0m27.598s
after:
real 0m16.588s
user 0m11.731s
sys 0m3.218s
Note: I pushed the exact same change for both scenarios.
Conclusion:
The results indicate that it would be very advantages to run 'git gc'
for both file size reduction and improved performance. Below are
additional resources that I've found on the internet that seems to
back up my results.
references:
This says that one-file-per-ref format both wastes storage and hurts
performance: https://git-scm.com/docs/git-pack-refs
This outlines some of the benefits and drawbacks of packed-refs file:
https://www.mail-archive.com/git%40vger.kernel.org/msg65722.html
Info on speeding up clones/fetches with pack bitmaps:
https://www.mail-archive.com/git%40vger.kernel.org/msg65571.html
On Fri, Jan 8, 2016 at 12:13 PM, James E. Blair <corvus at inaugust.com> wrote:
> Hi,
>
> With the new version of Gerrit offering built-in "git gc" capability, we
> looked at the current state of our git repo maintenance. We run "git
> repack -afd" weekly in an attempt to produce the smallest packfiles
> possible, but it does not prune loose objects, which seems to be the
> main thing "git gc" does that we are missing.
>
> Some (relatively) quick experimentation suggests that various
> combinations of "git gc", "git repack", "git prune", "git prune-packed"
> all have effects on the overall repo size, the number of pack files, and
> the number of loose objects.
>
> However, we don't just want to find the thing that makes the smallest
> repo size (that's easy: "git prune; git gc" -- 394M for nova; one
> packfile with all objects and one packed-refs file with all refs)
> because this repo is used as the basis of all of our mirrors and is
> accessed over several protocols. It's not immediately clear what the
> right optimization is for our situation -- we don't necessarily want to
> trade on-disk size for reduced network performance. Even the packing of
> refs isn't entirely straightforward -- while we haven't needed to for
> some time, we have, in the past removed refs.
>
> We're looking for a volunteer to really dig into this problem and
> thoroughly evaluate the implications of different ways of optimizing the
> repo. If you're interested, you can download a snapshot of the full
> nova repository from Gerrit (it is a point-in-time snapshot and will not
> be updated) at this URL:
>
> http://tarballs.openstack.org/ci/nova.git.tar.bz2
>
> Please follow up this message if you are interested and with any
> findings.
>
> Thanks,
>
> Jim
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
More information about the OpenStack-Infra
mailing list