Long, Slow Zuul Queues and Why They Happen

Donny Davis donny at fortnebula.com
Mon Sep 23 15:03:30 UTC 2019


*These are only observations, so please keep in mind I am only trying to
get to the bottom of efficiency with our limited resources.*
Please feel free to correct my understanding

We have some core projects which many other projects depend on - Nova,
Glance, Keystone, Neutron, Cinder. etc
In the CI it's equal access for any project.
If feature A in non-core project depends on feature B in core project - why
is feature B not prioritized ?

Can we solve this issue by breaking apart the current equal access
structure into something more granular?

I understand that improving job efficiencies will likely result in more
smaller jobs, but will that actually solve issue at the gate come this time
in the cycle...every release? (as I am sure it comes up every time)
More smaller jobs will result in more jobs - If the job time is cut in
half, but the # of jobs is doubled we will probably still have the same
issue.

We have limited resources and without more providers coming online I fear
this issue is only going to get worse as time goes on if we do nothing.

~/DonnyD




On Fri, Sep 13, 2019 at 3:47 PM Matt Riedemann <mriedemos at gmail.com> wrote:

> On 9/13/2019 2:03 PM, Clark Boylan wrote:
> > We've been fielding a fair bit of questions and suggestions around
> Zuul's long change (and job) queues over the last week or so. As a result I
> tried to put a quick FAQ type document [0] on how we schedule jobs, why we
> schedule that way, and how we can improve the long queues.
> >
> > Hoping that gives us all a better understanding of why were are in the
> current situation and ideas on how we can help to improve things.
> >
> > [0]
> https://docs.openstack.org/infra/manual/testing.html#why-are-jobs-for-changes-queued-for-a-long-time
>
> Thanks for writing this up Clark.
>
> As for the current status of the gate, several nova devs have been
> closely monitoring the gate since we have 3 fairly lengthy series of
> feature changes approved since yesterday and we're trying to shepherd
> those through but we're seeing failures and trying to react to them.
>
> Two issues of note this week:
>
> 1. http://status.openstack.org/elastic-recheck/index.html#1843615
>
> I had pushed a fix for that one earlier in the week but there was a bug
> in my fix which Takashi has fixed:
>
> https://review.opendev.org/#/c/682025/
>
> That was promoted to the gate earlier today but failed on...
>
> 2. http://status.openstack.org/elastic-recheck/index.html#1813147
>
> We have a couple of patches up for that now which might get promoted
> once we are reasonably sure those are going to pass check (promote to
> gate means skipping check which is risky because if it fails in the gate
> we have to re-queue the gate as the doc above explains).
>
> As far as overall failure classifications we're pretty good there in
> elastic-recheck:
>
> http://status.openstack.org/elastic-recheck/data/integrated_gate.html
>
> Meaning for the most part we know what's failing, we just need to fix
> the bugs.
>
> One that continues to dog us (and by "us" I mean OpenStack, not just
> nova) is this one:
>
> http://status.openstack.org/elastic-recheck/gate.html#1686542
>
> The QA team's work to split apart the big tempest full jobs into
> service-oriented jobs like tempest-integrated-compute should have helped
> here but we're still seeing there are lots of jobs timing out which
> likely means there are some really slow tests running in too many jobs
> and those require investigation. It could also be devstack setup that is
> taking a long time like Clark identified with OSC usage awhile back:
>
>
> http://lists.openstack.org/pipermail/openstack-discuss/2019-July/008071.html
>
> If you have questions about how elastic-recheck works or how to help
> investigate some of these failures, like with using
> logstash.openstack.org, please reach out to me (mriedem), clarkb and/or
> gmann in #openstack-qa.
>
> --
>
> Thanks,
>
> Matt
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190923/d41f73e9/attachment-0001.html>


More information about the openstack-discuss mailing list