[OpenStack-Infra] An idea to scale Zuul

Monty Taylor mordred at inaugust.com
Thu Jan 9 19:26:42 UTC 2014


+100 (apologies for top-post - laptop was stolen, limping along on Windows
for a few days. Ew.)

> -----Original Message-----
> From: Robert Collins [mailto:robertc at robertcollins.net]
> Sent: Wednesday, January 08, 2014 11:52 PM
> To: James E. Blair
> Cc: <openstack-infra at lists.openstack.org>
> Subject: Re: [OpenStack-Infra] An idea to scale Zuul
> 
> Looks good to me.
> 
> On 9 January 2014 19:59, James E. Blair <jeblair at openstack.org> wrote:
> > Hi,
> >
> > When Zuul gets very busy, it can end up launching hundreds of jobs
> > nearly simultaneously.  Each of them has to perform several git fetch
> > operations to obtain the changes needed for testing.  They fetch from
> > the git repos on the Zuul server because Zuul itself is creating those
> > commits by locally merging several changes together according to
> > what's in the queue.
> >
> > The acts of fetching and merging git patchsets (which is single
> > threaded) adds some load to the server, but in particular, serving
> > those git refs to 400 Jenkins nodes nearly simultaneously can also be
> > a bit of a burden.  It was too much for our previous server; we've
> > moved Zuul to a faster server now, but it would be nice to have a more
> > scalable solution for the future.
> >
> > I'd like to move the Zuul git merging component into a separate
> > process that can be located on a separate host (or hosts) and scaled
out.
> >
> > The current zuul-server would continue to manage the queue and launch
> > jobs, but as it processes the queue and decides which changes should
> > be composed and built into zuul git refs, it would package the info
> > about each ref and put it on the gearman queue as a work item.  An
> > instance of the new component (zuul-merger) would fetch that job and
> > fetch the needed refs from Gerrit, and merge them.  It would also
> > serve the resulting git repo in the same way that Zuul does now.
> >
> > Zuul would not have to wait for a response before continuing to
> > process the queue, and since it's not doing any actual work, will be
> > able to move through the queue _much_ faster than currently.  Once
> > Zuul _does_ receive a completion response from a zuul-merger, it can
> > then launch the jobs for that change.  It will pass the URL for that
> > particular zuul-merger (as ZUUL_URL) to the jobs so that they know
> > from which merger to fetch the zuul ref.  We can also use the cancel
> > job functionality in gearman if Zuul decides to reorder the queue.
> >
> > We can scale out the mergers horizontally and they can operate in
> > parallel, which should also improve the responsiveness of overall
> > queue processing.
> >
> > The only downside I currently foresee is that if we scale out the
> > mergers too much, we will see a performance impact on gerrit;
> > therefore we should anticipate having a reasonably small number of
> > these (2-8, perhaps).
> >
> > Since this is already quite modular, I think the implementation should
> > be relatively simple.
> >
> > How does that sound?
> >
> > -Jim
> >
> > _______________________________________________
> > OpenStack-Infra mailing list
> > OpenStack-Infra at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
> 
> 
> 
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
> 
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra





More information about the OpenStack-Infra mailing list