[placement] zuul job dependencies for greater good?

Sean Mooney smooney at redhat.com
Tue Feb 26 11:04:50 UTC 2019


On Mon, 2019-02-25 at 19:42 -0500, Clark Boylan wrote:
> On Mon, Feb 25, 2019, at 12:51 PM, Ben Nemec wrote:
> > 
> 
> snip
> 
> > That said, I wouldn't push too hard in either direction until someone 
> > crunched the numbers and figured out how much time it would have saved 
> > to not run long tests on patch sets with failing unit tests. I feel like 
> > it's probably possible to figure that out, and if so then we should do 
> > it before making any big decisions on this.
> 
clark this sound like a interesting topic to dig into in person at the ptg/fourm.
do you think we could do two things in parallel.
1 find a slot maybe in the infra track to discuss this.
2 can we create a new "fast-check" pipeline in zuul so we can do some experiment

if we have a second pipeline with almost identical trrigers we can propose in tree job
changes and not merge them and experiment with how this might work.
i can submit a patch to do that to the project-config repo but wanted to check on the ml first.

again to be clear my suggestion for an experiment it to modify the gate jobs to require approval
from zuul in both the check and fast check pipeline and kick off job in both pipeline in parallel
so inially the check pipeline jobs would not be condtional on the fast-check pipeline jobs.

the intent is to run exactly the same amount of test we do today but just to have zuul comment back in two batchs
one form each pipeline.

as a step two i would also be interested with merging all of the tox env jobs into one.
i think that could be done by creating a new job that inherits form the base tox job and just invoke the run play book
of all the tox-<env> jobs from a singel playbook.

i can do experiment 2 without entirly form the in repo zuul.yaml file

i think it would be interesting to do a test with "do not merge" patches to nova or placement and
see how that works

> For numbers the elastic-recheck tool [0] gives us fairly accurate tracking of which issues in the system cause tests
> to fail. You can use this as a starting point to potentially figure out how expensive indentation errors caught by the
> pep8 jobs ends up being or how often unittests fail. You probably need to tweak the queries there to get that specific
> though.
> 
> Periodically I also dump node resource utilization by project, repo, and job [1]. I haven't automated this because
> Tobiash has written a much better thing that has Zuul inject this into graphite and we should be able to set up a
> grafana dashboard for that in the future instead.
> 
> These numbers won't tell a whole story, but should paint a fairly accurate high level picture of the types of things
> we should look at to be more node efficient and "time in gate" efficient. Looking at these two really quickly myself
> it seems that job timeouts are a big cost (anyone looking into why our jobs timeout?).
> 
> [0] http://status.openstack.org/elastic-recheck/index.html
> [1] http://paste.openstack.org/show/746083/
> 
> Hope this helps,
> Clark
> 




More information about the openstack-discuss mailing list