[OpenStack-Infra] zuul-merger and garbage collection
cboylan at sapwetik.org
Tue May 12 04:45:48 UTC 2015
On Mon, May 11, 2015, at 02:25 AM, Heald, Nicola wrote:
> Hi all,
> We've noticed that when repos have been in use for a while by
> zuul-merger, they have a lot of left over objects that slow down git
> operations. Zuul-merger also seems to keep open a bunch of file handles
> to various objects, so running `git gc` on the repos seems dangerous, if
> they got deleted by `git gc` while zm has them open still, that might
> cause all sorts of trouble for zuul-merger, yes?
> What's the safest way of doing gc on the repos? Or would this be a good
> feature to add to zuul-merger?
I would expect git gc on zuul merger repos to be safe. git gc only
cleans up unreachable refs if they are 30 days old by default.
Additionally zuul itself tries to be resilient to changes of the repos
under it. In fact you should be able to just delete all the repos while
the merger is running and the merger will reclone on the next job
(though this will likely race running jobs and is safest to run when the
merger is stopped). Also, due to bugs in GitPython we actually have to
recreate the repo objects in python prior to many operations which
should close then reopen files to keep them up to date.
One example we have run into with GitPython is that if the repo is
repacked (which git can do for you when it decides to) object files may
not exist any longer and need to be refound in the pack file instead.
The only way to get GitPython to see that is the make a new repo object.
All this to say that the zuul mergers should already be quite resilient
to files going away if they somehow do go away with a 30 day expiration.
However, we don't gc those repos so I have no hard evidence that git gc
specifically is safe.
More information about the OpenStack-Infra