[OpenStack-Infra] Fun (important!) project: optimize Gerrit's nova git repo
Boris Pavlovic
bpavlovic at mirantis.com
Sun Mar 27 03:59:20 UTC 2016
Zaro,
Thank you for the research, seems like we should definitely run gc against
nova repo.
Best regards,
Boris Pavlovic
On Fri, Mar 25, 2016 at 5:47 PM, Zaro <zaro0508 at gmail.com> wrote:
> So I've been researching this and I've found that there is a
> significant performance improvement after running git gc on this nova
> repro. Below are my results.
>
> File sizes of repo as-is:
> ~/nova.git.orig$ du -hsx * | sort -r | head -10
> 6.4G objects
> 6.1M info
> 4.0K config
> 4.0K HEAD
> 382M refs
> 2.1M logs
> 0B hooks
> 0B description
> 0B branches
>
> Note that the repro as-is has already been thru a 'git repack -afd'.
>
>
> File sizes after running 'jgit gc':
> ~/nova.git.test$ du -hsx * | sort -r | head -10
> 6.1M packed-refs
> 6.1M info
> 420M objects
> 4.0K config
> 4.0K HEAD
> 2.1M logs
> 0B refs
> 0B hooks
> 0B description
> 0B branches
>
> The result is that the gc cleans up the objects (6.4G -> 420M) and
> moves the loose ref objects from 'refs' dir to a 'packed-refs' file
> (382M -> 6.1M).
>
> Note that I'm using jgit because that's what Gerrit would use to do
> the 'gc'. The jgit version is 4.0.1.201506240215-r which is the one
> that's packaged with our current version of Gerrit
> (2.11.4-11-ga14450f) on review.o.o
>
>
> Here I've tested the performance of the git clone, fetch and push
> before and after running 'jgit gc':
>
> `git clone`
> ------------
> before:
> real 3m30.163s
> user 0m2.020s
> sys 3m15.087s
>
> after:
> real 0m0.925s
> user 0m0.406s
> sys 0m0.621s
>
>
> `git fetch origin stable/liberty`
> ---------------------------------
> before:
> real 0m4.271s
> user 0m0.701s
> sys 0m2.949s
>
> after:
> real 0m0.686s
> user 0m0.348s
> sys 0m0.307s
>
>
> `git push origin HEAD:refs/for/master`
> --------------------------------------
> before:
> real 0m36.454s
> user 0m5.346s
> sys 0m27.598s
>
> after:
> real 0m16.588s
> user 0m11.731s
> sys 0m3.218s
>
> Note: I pushed the exact same change for both scenarios.
>
>
> Conclusion:
> The results indicate that it would be very advantages to run 'git gc'
> for both file size reduction and improved performance. Below are
> additional resources that I've found on the internet that seems to
> back up my results.
>
>
>
> references:
>
> This says that one-file-per-ref format both wastes storage and hurts
> performance: https://git-scm.com/docs/git-pack-refs
>
> This outlines some of the benefits and drawbacks of packed-refs file:
> https://www.mail-archive.com/git%40vger.kernel.org/msg65722.html
>
> Info on speeding up clones/fetches with pack bitmaps:
> https://www.mail-archive.com/git%40vger.kernel.org/msg65571.html
>
> On Fri, Jan 8, 2016 at 12:13 PM, James E. Blair <corvus at inaugust.com>
> wrote:
> > Hi,
> >
> > With the new version of Gerrit offering built-in "git gc" capability, we
> > looked at the current state of our git repo maintenance. We run "git
> > repack -afd" weekly in an attempt to produce the smallest packfiles
> > possible, but it does not prune loose objects, which seems to be the
> > main thing "git gc" does that we are missing.
> >
> > Some (relatively) quick experimentation suggests that various
> > combinations of "git gc", "git repack", "git prune", "git prune-packed"
> > all have effects on the overall repo size, the number of pack files, and
> > the number of loose objects.
> >
> > However, we don't just want to find the thing that makes the smallest
> > repo size (that's easy: "git prune; git gc" -- 394M for nova; one
> > packfile with all objects and one packed-refs file with all refs)
> > because this repo is used as the basis of all of our mirrors and is
> > accessed over several protocols. It's not immediately clear what the
> > right optimization is for our situation -- we don't necessarily want to
> > trade on-disk size for reduced network performance. Even the packing of
> > refs isn't entirely straightforward -- while we haven't needed to for
> > some time, we have, in the past removed refs.
> >
> > We're looking for a volunteer to really dig into this problem and
> > thoroughly evaluate the implications of different ways of optimizing the
> > repo. If you're interested, you can download a snapshot of the full
> > nova repository from Gerrit (it is a point-in-time snapshot and will not
> > be updated) at this URL:
> >
> > http://tarballs.openstack.org/ci/nova.git.tar.bz2
> >
> > Please follow up this message if you are interested and with any
> > findings.
> >
> > Thanks,
> >
> > Jim
> >
> > _______________________________________________
> > OpenStack-Infra mailing list
> > OpenStack-Infra at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20160326/89c94f1d/attachment.html>
More information about the OpenStack-Infra
mailing list