[OpenStack-Infra] Fun (important!) project: optimize Gerrit's nova git repo

Zaro zaro0508 at gmail.com
Mon Jun 13 16:29:21 UTC 2016


`git gc` enables prune by default [1]. Running `git gc` cleans up the
objects (6.4G -> 380M) and moves the refs to packed-refs file (382M ->
6M). I see the exact same result whether I run with C git or jgit.

Original files:
  ~/temp/nova.git.test$ du -hsx * | sort -r | head -10
  6.4G nova.git.orig/objects
  6.1M nova.git.orig/info
  4.0K nova.git.orig/config
  4.0K nova.git.orig/HEAD
  382M nova.git.orig/refs
  2.1M nova.git.orig/logs
  0B nova.git.orig/hooks
  0B nova.git.orig/description
  0B nova.git.orig/branches

After a `git gc`:
  ~/temp/nova.git.test$ git gc
  Counting objects: 1210923, done.
  Delta compression using up to 4 threads.
  Compressing objects: 100% (155559/155559), done.
  Writing objects: 100% (1210923/1210923), done.
  Total 1210923 (delta 1002442), reused 1205966 (delta 997777)
  Removing duplicate objects: 100% (256/256), done.
  Checking connectivity: 1210923, done.
  ~/temp/nova.git.test$ du -hsx * | sort -r | head -10
  6.1M packed-refs
  6.1M info
  4.0K config
  4.0K HEAD
  380M objects
  64K logs
  0B refs
  0B hooks
  0B description
  0B branches


[1]  https://git-scm.com/docs/git-gc   ('prune is on by default')

On Mon, Jun 13, 2016 at 7:58 AM, James E. Blair <corvus at inaugust.com> wrote:
> Zaro <zaro0508 at gmail.com> writes:
>
>> I forgot to mention that the apps we use (gerrit and cgit) to host our
>> git repos do read the repos directly from disk therefore I think that
>> performing a gc on the repos would provide a performance improvement
>> (CPU and memory utilization) to gerrit and cgit.  It might be
>> difficult to quantify how much of an improvement since both those apps
>> do some cacheing of the repo data.  Anyways I think there would be
>> other benefits of `git gc` over `gerrit repack -adf` besides just
>> recovering disk space.  -Khai
>
> What was the effect of 'git prune' with 'git gc'?  In my original
> message, I mentioned that the two together had the greatest effect on
> disk space -- the change in ref structure could have a significant
> impact to cloning time as well, but also to our ability to issue
> corrective modifications to the repos.
>
> -Jim



More information about the OpenStack-Infra mailing list