[OpenStack-Infra] An idea to scale Zuul

Joshua Hesketh joshua.hesketh at rackspace.com
Fri Jan 10 01:31:11 UTC 2014


Hey,

Sounds good to me :-).

A more immediate option of reducing the load on zuul could be to run a 
duplicate zuul with only the check pipeline. That is, run a zuul per a 
pipeline. In fact, we could essentially distribute independent pipelines 
(but I realise that part would require a bit of refactoring).

Cheers,
Josh

Rackspace Australia

On 1/9/14 2:59 PM, James E. Blair wrote:
> Hi,
>
> When Zuul gets very busy, it can end up launching hundreds of jobs
> nearly simultaneously.  Each of them has to perform several git fetch
> operations to obtain the changes needed for testing.  They fetch from
> the git repos on the Zuul server because Zuul itself is creating those
> commits by locally merging several changes together according to what's
> in the queue.
>
> The acts of fetching and merging git patchsets (which is single
> threaded) adds some load to the server, but in particular, serving those
> git refs to 400 Jenkins nodes nearly simultaneously can also be a bit of
> a burden.  It was too much for our previous server; we've moved Zuul to
> a faster server now, but it would be nice to have a more scalable
> solution for the future.
>
> I'd like to move the Zuul git merging component into a separate process
> that can be located on a separate host (or hosts) and scaled out.
>
> The current zuul-server would continue to manage the queue and launch
> jobs, but as it processes the queue and decides which changes should be
> composed and built into zuul git refs, it would package the info about
> each ref and put it on the gearman queue as a work item.  An instance of
> the new component (zuul-merger) would fetch that job and fetch the
> needed refs from Gerrit, and merge them.  It would also serve the
> resulting git repo in the same way that Zuul does now.
>
> Zuul would not have to wait for a response before continuing to process
> the queue, and since it's not doing any actual work, will be able to
> move through the queue _much_ faster than currently.  Once Zuul _does_
> receive a completion response from a zuul-merger, it can then launch the
> jobs for that change.  It will pass the URL for that particular
> zuul-merger (as ZUUL_URL) to the jobs so that they know from which
> merger to fetch the zuul ref.  We can also use the cancel job
> functionality in gearman if Zuul decides to reorder the queue.
>
> We can scale out the mergers horizontally and they can operate in
> parallel, which should also improve the responsiveness of overall queue
> processing.
>
> The only downside I currently foresee is that if we scale out the
> mergers too much, we will see a performance impact on gerrit; therefore
> we should anticipate having a reasonably small number of these (2-8,
> perhaps).
>
> Since this is already quite modular, I think the implementation should
> be relatively simple.
>
> How does that sound?
>
> -Jim
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra




More information about the OpenStack-Infra mailing list