[OpenStack-Infra] Enhancement to experimental queue feature?

Clark Boylan clark.boylan at gmail.com
Thu Dec 26 23:52:07 UTC 2013


On Thu, Dec 26, 2013 at 1:04 PM, David Kranz <dkranz at redhat.com> wrote:
> So the experimental queue feature is extremely useful and a great addition.
> But, including a pending patch, the tempest experimental queue now has nine
> jobs involving slow heat, grenade forward, neutron isolated, savannah, etc.
> There are surely more to come. This means that each use of check
> experimental for tempest will waste time running many jobs that the invoker
> does not care about, perhaps running much longer than the jobs the user
> actually does care about. Would it be plausible to change the behaviour so
> that
>
> check experimental [regexp]
>
> would run only those jobs on the experimental queue whose name was matched
> by [regexp]? Or something like that?
>
>  -David
>
>
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

I don't think the number of jobs being run in the experimental queue
is a problem. The number of jobs run there is still a tiny percentage
of the total number of jobs run by Zuul every day. I understand the
issue that needs solving is that long running jobs in that list cause
Zuul to wait on reporting back to Gerrit which delays iterating on
fixes for shorter running jobs.

I think the proposed change would address that problem well, but this
change will probably require updating Zuul to pass user generated
event data into the job filtration system to add filters. jeblair may
have others ideas on how this can be solved, but I am pretty sure it
can't be done via configuration today. As an alternative folks can
look at the Zuul status page and from there get direct info from
Jenkins for real time feedback. One does not need to wait for the long
running experimental job to complete before examining other tests that
you care about. The other turn around time item to consider is that a
second check experimental before the first check's results report back
is a noop. To work around this a newer patchset can be submitted to
Gerrit which will cancel the running jobs for the previous patchset. I
think this fits into typical experimental workflow well; check
experimental first patchset, find problem, push second patchset to fix
found problem and so on.

The workarounds above aren't perfect, but considering the experimental
queue's purpose I think they should work well enough. Tests should not
remain in the experimental queue for long. A few weeks should be the
goal. If we are actively moving jobs from the experimental queue to
other queues we reduce the exposure to long running jobs that
interfere with other jobs. I think we should also be trying to reduce
the amount of magic number stuff that goes into review comments for
Zuul's consumption (it confuses devs as it gets more complicated) and
the proposal to add more data to the comments runs counter to this
goal.

Clark



More information about the OpenStack-Infra mailing list