[OpenStack-Infra] An idea to scale Zuul
Clint Byrum
clint at fewbar.com
Thu Jan 9 12:04:00 UTC 2014
Excerpts from jeblair's message of 2014-01-08 23:59:45 -0700:
> Hi,
>
> When Zuul gets very busy, it can end up launching hundreds of jobs
> nearly simultaneously. Each of them has to perform several git fetch
> operations to obtain the changes needed for testing. They fetch from
> the git repos on the Zuul server because Zuul itself is creating those
> commits by locally merging several changes together according to what's
> in the queue.
>
> The acts of fetching and merging git patchsets (which is single
> threaded) adds some load to the server, but in particular, serving those
> git refs to 400 Jenkins nodes nearly simultaneously can also be a bit of
> a burden. It was too much for our previous server; we've moved Zuul to
> a faster server now, but it would be nice to have a more scalable
> solution for the future.
>
> I'd like to move the Zuul git merging component into a separate process
> that can be located on a separate host (or hosts) and scaled out.
>
> The current zuul-server would continue to manage the queue and launch
> jobs, but as it processes the queue and decides which changes should be
> composed and built into zuul git refs, it would package the info about
> each ref and put it on the gearman queue as a work item. An instance of
> the new component (zuul-merger) would fetch that job and fetch the
> needed refs from Gerrit, and merge them. It would also serve the
> resulting git repo in the same way that Zuul does now.
>
> Zuul would not have to wait for a response before continuing to process
> the queue, and since it's not doing any actual work, will be able to
> move through the queue _much_ faster than currently. Once Zuul _does_
> receive a completion response from a zuul-merger, it can then launch the
> jobs for that change. It will pass the URL for that particular
> zuul-merger (as ZUUL_URL) to the jobs so that they know from which
> merger to fetch the zuul ref. We can also use the cancel job
> functionality in gearman if Zuul decides to reorder the queue.
>
> We can scale out the mergers horizontally and they can operate in
> parallel, which should also improve the responsiveness of overall queue
> processing.
>
> The only downside I currently foresee is that if we scale out the
> mergers too much, we will see a performance impact on gerrit; therefore
> we should anticipate having a reasonably small number of these (2-8,
> perhaps).
>
> Since this is already quite modular, I think the implementation should
> be relatively simple.
>
> How does that sound?
This takes advantage of my favorite part of the gearman worker/client
model. You can have exactly as many workers as your back-end can handle,
and then use the queue length as a direct measurement of latency which
helps you decide when to scale out the back-end (in this case, Gerrit).
More information about the OpenStack-Infra
mailing list