[OpenStack-Infra] Fun (important!) project: optimize Gerrit's nova git repo

Arie Bregman abregman at redhat.com
Wed Mar 30 08:22:34 UTC 2016


On Sat, Mar 26, 2016 at 3:47 AM, Zaro <zaro0508 at gmail.com> wrote:
> So I've been researching this and I've found that there is a
> significant performance improvement after running git gc on this nova
> repro.  Below are my results.
>
> File sizes of repo as-is:
> ~/nova.git.orig$ du -hsx * | sort -r | head -10
> 6.4G objects
> 6.1M info
> 4.0K config
> 4.0K HEAD
> 382M refs
> 2.1M logs
>  0B hooks
>  0B description
>  0B branches
>
> Note that the repro as-is has already been thru a 'git repack -afd'.
>
>
> File sizes after running 'jgit gc':
> ~/nova.git.test$ du -hsx * | sort -r | head -10
> 6.1M packed-refs
> 6.1M info
> 420M objects
> 4.0K config
> 4.0K HEAD
> 2.1M logs
>  0B refs
>  0B hooks
>  0B description
>  0B branches
>
> The result is that the gc cleans up the objects (6.4G -> 420M) and
> moves the loose ref objects from 'refs' dir to a 'packed-refs' file
> (382M -> 6.1M).
>
> Note that I'm using jgit because that's what Gerrit would use to do
> the 'gc'.  The jgit version is 4.0.1.201506240215-r which is the one
> that's packaged with our current version of Gerrit
> (2.11.4-11-ga14450f) on review.o.o
>
>
> Here I've tested the performance of the git clone, fetch and push
> before and after running 'jgit gc':
>
> `git clone`
> ------------
> before:
> real  3m30.163s
> user 0m2.020s
> sys   3m15.087s
>
> after:
> real  0m0.925s
> user 0m0.406s
> sys   0m0.621s
>
>
> `git fetch origin stable/liberty`
> ---------------------------------
> before:
> real  0m4.271s
> user 0m0.701s
> sys   0m2.949s
>
> after:
> real  0m0.686s
> user 0m0.348s
> sys   0m0.307s
>
>
> `git push origin HEAD:refs/for/master`
> --------------------------------------
> before:
> real  0m36.454s
> user 0m5.346s
> sys   0m27.598s
>
> after:
> real  0m16.588s
> user 0m11.731s
> sys   0m3.218s
>
> Note: I pushed the exact same change for both scenarios.
>
>
> Conclusion:
> The results indicate that it would be very advantages to run 'git gc'
> for both file size reduction and improved performance. Below are
> additional resources that I've found on the internet that seems to
> back up my results.
>
>
>
> references:
>
> This says that one-file-per-ref format both wastes storage and hurts
> performance:  https://git-scm.com/docs/git-pack-refs
>
> This outlines some of the benefits and drawbacks of packed-refs file:
> https://www.mail-archive.com/git%40vger.kernel.org/msg65722.html
>
> Info on speeding up clones/fetches with pack bitmaps:
> https://www.mail-archive.com/git%40vger.kernel.org/msg65571.html
>
> On Fri, Jan 8, 2016 at 12:13 PM, James E. Blair <corvus at inaugust.com> wrote:
>> Hi,
>>
>> With the new version of Gerrit offering built-in "git gc" capability, we
>> looked at the current state of our git repo maintenance.  We run "git
>> repack -afd" weekly in an attempt to produce the smallest packfiles
>> possible, but it does not prune loose objects, which seems to be the
>> main thing "git gc" does that we are missing.
>>
>> Some (relatively) quick experimentation suggests that various
>> combinations of "git gc", "git repack", "git prune", "git prune-packed"
>> all have effects on the overall repo size, the number of pack files, and
>> the number of loose objects.
>>
>> However, we don't just want to find the thing that makes the smallest
>> repo size (that's easy: "git prune; git gc" -- 394M for nova; one
>> packfile with all objects and one packed-refs file with all refs)
>> because this repo is used as the basis of all of our mirrors and is
>> accessed over several protocols.  It's not immediately clear what the
>> right optimization is for our situation -- we don't necessarily want to
>> trade on-disk size for reduced network performance.  Even the packing of
>> refs isn't entirely straightforward -- while we haven't needed to for
>> some time, we have, in the past removed refs.
>>
>> We're looking for a volunteer to really dig into this problem and
>> thoroughly evaluate the implications of different ways of optimizing the
>> repo.  If you're interested, you can download a snapshot of the full
>> nova repository from Gerrit (it is a point-in-time snapshot and will not
>> be updated) at this URL:
>>
>>   http://tarballs.openstack.org/ci/nova.git.tar.bz2
>>
>> Please follow up this message if you are interested and with any
>> findings.
>>
>> Thanks,
>>
>> Jim
>>
>> _______________________________________________
>> OpenStack-Infra mailing list
>> OpenStack-Infra at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Zaro, thanks for sharing test results.

I have tested it on neutron project.

Before 'git gc':

neutron total size: 65M

Files:
----------------------------------------
4.0K .git/branches
4.0K .git/config
4.0K .git/description
4.0K .git/HEAD
44K .git/hooks
152K .git/index
8.0K .git/info
32K .git/logs
50M .git/objects
12K .git/packed-refs
28K .git/refs
----------------------------------------

After running 'git gc --aggressive':

neutron total size: 47M

4.0K .git/branches
4.0K .git/config
4.0K .git/description
4.0K .git/HEAD
44K .git/hooks
152K .git/index
24K .git/info
32K .git/logs
32M .git/objects
12K .git/packed-refs
24K .git/refs

Each command executed 3 times:

--- git clone before gc --
10.59s user 0.74s system 195% cpu 5.785 total
12.80s user 0.63s system 205% cpu 6.554 total
12.27s user 0.61s system 202% cpu 5.849 total

--- git clone after gc---
8.69s user 0.52s system 149% cpu 6.178 total
8.61s user 0.55s system 175% cpu 5.230 total
8.62s user 0.51s system 187% cpu 4.877 total

--- git fetch origin stable/liberty before gc---
0.05s user 0.04s system 4% cpu 1.850 total
0.05s user 0.04s system 4% cpu 1.899 total
0.04s user 0.05s system 4% cpu 1.840 total

--- git fetch origin stable/liberty after gc ---
0.01s user 0.01s system 9% cpu 0.245 total
0.02s user 0.01s system 12% cpu 0.173 total
0.01s user 0.01s system 11% cpu 0.193 total

--- git push origin HEAD:refs/for/master before gc ---
0.05s user 0.04s system 4% cpu 1.850 total
0.03s user 0.04s system 4% cpu 1.899 total
0.05s user 0.05s system 3% cpu 1.573 total

--- git push origin HEAD:refs/for/master after gc ---
0.01s user 0.00s system 12% cpu 0.142 total
0.01s user 0.01s system 12% cpu 0.178 total
0.01s user 0.01s system 11% cpu 0.183 total

Also done quick test on openstack infra project ( project-config ):

Before gc:                         97M project-config
After gc --aggressive:      19M project-config

--- git clone before gc --
7.81s user 0.52s system 146% cpu 5.677 total
6.91s user 0.48s system 144% cpu 5.112 total
7.43s user 0.66s system 147% cpu 5.496 total

--- git clone after gc --
6.39s user 0.56s system 139% cpu 4.965 total
6.32s user 0.51s system 130% cpu 5.218 total
6.39s user 0.55s system 127% cpu 5.431 total

Cheers,

Arie Bregman



More information about the OpenStack-Infra mailing list