Long, Slow Zuul Queues and Why They Happen

Donny Davis donny at fortnebula.com
Mon Sep 23 15:20:39 UTC 2019


In a different thread I had another possible suggestion - its probably more
appropriate for this one. [1]

It would also be helpful to give the project a way to prefer certain infra
providers for certain jobs.

For the most part Fort Neubla is terrible at CPU bound long running jobs...
I wish I could make it better, but I cannot.

Is there a method we could come up with that would allow us to exploit
certain traits of a certain provider? Maybe like some additional metadata
that say what the certain provider is best at doing?

For example highly IO bound jobs work like gangbusters on FN because the
underlying storage is very fast, but CPU bound jobs do the direct opposite.

Thoughts? ~/DonnyD


1.
http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009592.html

On Mon, Sep 23, 2019 at 11:14 AM Clark Boylan <cboylan at sapwetik.org> wrote:

> On Mon, Sep 23, 2019, at 8:03 AM, Donny Davis wrote:
> > *These are only observations, so please keep in mind I am only trying
> > to get to the bottom of efficiency with our limited resources.*
> > Please feel free to correct my understanding
> >
> > We have some core projects which many other projects depend on - Nova,
> > Glance, Keystone, Neutron, Cinder. etc
> > In the CI it's equal access for any project.
> > If feature A in non-core project depends on feature B in core project -
> > why is feature B not prioritized ?
>
> The priority queuing happens per "gate queue". The integrated gate (nova,
> cinder, keystone, etc) has one queue, Tripleo has another, OSA has one and
> so on. We do this so that important work can happen across disparate
> efforts.
>
> What this means is if Nova and the rest of the integrated gate has a set
> of priority changes they should stop approving other changes while they
> work to merge those priority items. I have suggested that OpenStack needs
> an "air traffic controller" to help coordinate these efforts particularly
> around feature freeze time (I suggested it to both the QA team and release
> team). Any queue could use one if they wanted to.
>
> All that to say you can do this today, but it requires humans to work
> together and communicate what their goals are then give the CI system the
> correct information to act on these changes in the desired manner.
>
> >
> > Can we solve this issue by breaking apart the current equal access
> > structure into something more granular?
> >
> > I understand that improving job efficiencies will likely result in more
> > smaller jobs, but will that actually solve issue at the gate come this
> > time in the cycle...every release? (as I am sure it comes up every time)
> > More smaller jobs will result in more jobs - If the job time is cut in
> > half, but the # of jobs is doubled we will probably still have the same
> > issue.
> >
> > We have limited resources and without more providers coming online I
> > fear this issue is only going to get worse as time goes on if we do
> > nothing.
> >
> > ~/DonnyD
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190923/086e1c50/attachment-0001.html>


More information about the openstack-discuss mailing list