[placement] zuul job dependencies for greater good?

Chris Dent

25 Feb 2019 25 Feb '19

5:47 p.m.

Zuul has a feature that makes it possible to only run some jobs after others have passed: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies Except for tempest and grenade (which take about an hour to 1.5 hours to run, sometimes a lot more) the usual time for any of the placement tests is less than 6 minutes each, sometimes less than 4. I've been wondering if we might want to consider only running tempest and grenade if the other tests have passed first? So here's this message seeking opinions. On the one hand this ought to be redundant. The expectation is that a submitter has already done at least one python version worth of unit and functional tests. Fast8 too. On one of my machines 'tox -efunctional-py37,py37,pep8' on warmed up virtualenvs is a bit under 53 seconds. So it's not like it's a huge burden or cpu melting. But on the other hand, if someone has failed to do that, and they have failing tests, they shouldn't get the pleasure of wasting a tempest or grenade node. Another argument I've heard for not doing this is if there are failures of different types in different tests, having all that info for the round of fixing that will be required is good. That is, getting a unit failure, fixing that, then subumitting again, only to get an integration failure which then needs another round of fixing (and testing) might be rather annoying. I'd argue that that's important information about unit or functional tests being insufficient. I'm not at all sold on the idea, but thought it worth "socializing" for input. Thanks. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent

Show replies by date

Sorin Sbarnea

25 Feb 25 Feb

6:20 p.m.

I asked the save some time ago but we didn't had time to implement it, as is much harder to do this projects where list of jobs does change a lot quickly. Maybe if we would have some placeholder job like phase1/2/3 it would be easier to migrate to such setup. stage1 - cheap jobs like linters, docs,... - <10min stage2 - medium jobs like functional <30min stage3 - fat/expensive jobs like tempest, update/upgrade. >30min The idea to placeholders is to avoid having to refactor lots of dependencies Cheers Sorin

...

On 25 Feb 2019, at 17:47, Chris Dent <cdent+os@anticdent.org> wrote:

Zuul has a feature that makes it possible to only run some jobs after others have passed:

https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies

Except for tempest and grenade (which take about an hour to 1.5 hours to run, sometimes a lot more) the usual time for any of the placement tests is less than 6 minutes each, sometimes less than 4.

I've been wondering if we might want to consider only running tempest and grenade if the other tests have passed first? So here's this message seeking opinions.

On the one hand this ought to be redundant. The expectation is that a submitter has already done at least one python version worth of unit and functional tests. Fast8 too. On one of my machines 'tox -efunctional-py37,py37,pep8' on warmed up virtualenvs is a bit under 53 seconds. So it's not like it's a huge burden or cpu melting.

But on the other hand, if someone has failed to do that, and they have failing tests, they shouldn't get the pleasure of wasting a tempest or grenade node.

Another argument I've heard for not doing this is if there are failures of different types in different tests, having all that info for the round of fixing that will be required is good. That is, getting a unit failure, fixing that, then subumitting again, only to get an integration failure which then needs another round of fixing (and testing) might be rather annoying.

I'd argue that that's important information about unit or functional tests being insufficient.

I'm not at all sold on the idea, but thought it worth "socializing" for input.

Thanks.

-- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent

Sean Mooney

6:40 p.m.

...

I asked the save some time ago but we didn't had time to implement it, as is much harder to do this projects where list of jobs does change a lot quickly.

Maybe if we would have some placeholder job like phase1/2/3 it would be easier to migrate to such setup. stage1 - cheap jobs like linters, docs,... - <10min stage2 - medium jobs like functional <30min stage3 - fat/expensive jobs like tempest, update/upgrade. >30min yep i also suggesting somting similar where we woudls run all the non dvsm jobs first then everything else whtere the second set was conditonal or always run was a seperate conversation but i think

On Mon, 2019-02-25 at 18:20 +0000, Sorin Sbarnea wrote: there is value in reporting the result of the quick jobs first then everything else. i peronally would do just two levels os-vif for exampl complete all jobs except the one temest job in under 6 minutes. grated i run all the non integration job locally for my own patches but it would be nice to get the feedback quicker for other people patches as i ofter find my self checking zuul.openstack.org

...

The idea to placeholders is to avoid having to refactor lots of dependencies

Cheers Sorin

...
On 25 Feb 2019, at 17:47, Chris Dent <cdent+os@anticdent.org> wrote:

Zuul has a feature that makes it possible to only run some jobs after others have passed:

https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies

Except for tempest and grenade (which take about an hour to 1.5 hours to run, sometimes a lot more) the usual time for any of the placement tests is less than 6 minutes each, sometimes less than 4.

I've been wondering if we might want to consider only running tempest and grenade if the other tests have passed first? So here's this message seeking opinions.

On the one hand this ought to be redundant. The expectation is that a submitter has already done at least one python version worth of unit and functional tests. Fast8 too. On one of my machines 'tox -efunctional-py37,py37,pep8' on warmed up virtualenvs is a bit under 53 seconds. So it's not like it's a huge burden or cpu melting.

But on the other hand, if someone has failed to do that, and they have failing tests, they shouldn't get the pleasure of wasting a tempest or grenade node.

Another argument I've heard for not doing this is if there are failures of different types in different tests, having all that info for the round of fixing that will be required is good. That is, getting a unit failure, fixing that, then subumitting again, only to get an integration failure which then needs another round of fixing (and testing) might be rather annoying.

I'd argue that that's important information about unit or functional tests being insufficient.

I'm not at all sold on the idea, but thought it worth "socializing" for input.

Thanks.

-- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent

Eric Fried

8:36 p.m.

-1 to serializing jobs with stop-on-first-failure. Human time (having to iterate fixes one failed job at a time) is more valuable than computer time. That's why we make computers. If you want quick feedback on fast-running jobs (that are running in parallel with slower-running jobs), zuul.o.o is available and easy to use. If we wanted to get more efficient about our CI resources, there are other possibilities I would prefer to see tried first. For example, do we need a whole separate node to run each unit & functional job, or could we run them in parallel (or even serially, since all together they would probably still take less time than e.g. a tempest) on a single node? I would also support a commit message tag (or something) that tells zuul not to bother running CI right now. Or a way to go to zuul.o.o and yank a patch out. Realizing of course that these suggestions come from someone who uses zuul in the most superficial way possible (like, I wouldn't know how to write a... job? playbook? with a gun to my head) so they're probably exponentially harder than using the thing Chris mentioned. -efried

Ben Nemec

8:50 p.m.

On 2/25/19 2:36 PM, Eric Fried wrote:

...

-1 to serializing jobs with stop-on-first-failure. Human time (having to iterate fixes one failed job at a time) is more valuable than computer time. That's why we make computers. If you want quick feedback on fast-running jobs (that are running in parallel with slower-running jobs), zuul.o.o is available and easy to use.

In general I agree with this sentiment. However, I do think there comes a point where we'd be penny-wise and pound-foolish. If we're talking about 5 minute unit test jobs I'm not sure how much human time you're actually losing by serializing behind them, but you may be saving significant amounts of computer time. If we're talking about sufficient gains in gate throughput it might be worth it to lose 5 minutes here or there and in other cases save a couple of hours by not waiting in a long queue behind jobs on patches that are unmergeable anyway. That said, I wouldn't push too hard in either direction until someone crunched the numbers and figured out how much time it would have saved to not run long tests on patch sets with failing unit tests. I feel like it's probably possible to figure that out, and if so then we should do it before making any big decisions on this.

...

If we wanted to get more efficient about our CI resources, there are other possibilities I would prefer to see tried first. For example, do we need a whole separate node to run each unit & functional job, or could we run them in parallel (or even serially, since all together they would probably still take less time than e.g. a tempest) on a single node?

I would also support a commit message tag (or something) that tells zuul not to bother running CI right now. Or a way to go to zuul.o.o and yank a patch out.

Realizing of course that these suggestions come from someone who uses zuul in the most superficial way possible (like, I wouldn't know how to write a... job? playbook? with a gun to my head) so they're probably exponentially harder than using the thing Chris mentioned.

-efried

Clark Boylan

26 Feb 26 Feb

12:42 a.m.

On Mon, Feb 25, 2019, at 12:51 PM, Ben Nemec wrote:

...

snip

...

That said, I wouldn't push too hard in either direction until someone crunched the numbers and figured out how much time it would have saved to not run long tests on patch sets with failing unit tests. I feel like it's probably possible to figure that out, and if so then we should do it before making any big decisions on this.

For numbers the elastic-recheck tool [0] gives us fairly accurate tracking of which issues in the system cause tests to fail. You can use this as a starting point to potentially figure out how expensive indentation errors caught by the pep8 jobs ends up being or how often unittests fail. You probably need to tweak the queries there to get that specific though. Periodically I also dump node resource utilization by project, repo, and job [1]. I haven't automated this because Tobiash has written a much better thing that has Zuul inject this into graphite and we should be able to set up a grafana dashboard for that in the future instead. These numbers won't tell a whole story, but should paint a fairly accurate high level picture of the types of things we should look at to be more node efficient and "time in gate" efficient. Looking at these two really quickly myself it seems that job timeouts are a big cost (anyone looking into why our jobs timeout?). [0] http://status.openstack.org/elastic-recheck/index.html [1] http://paste.openstack.org/show/746083/ Hope this helps, Clark

Sean Mooney

11:04 a.m.

On Mon, 2019-02-25 at 19:42 -0500, Clark Boylan wrote:

...

On Mon, Feb 25, 2019, at 12:51 PM, Ben Nemec wrote:

...
snip

...
That said, I wouldn't push too hard in either direction until someone crunched the numbers and figured out how much time it would have saved to not run long tests on patch sets with failing unit tests. I feel like it's probably possible to figure that out, and if so then we should do it before making any big decisions on this.

clark this sound like a interesting topic to dig into in person at the ptg/fourm. do you think we could do two things in parallel. 1 find a slot maybe in the infra track to discuss this. 2 can we create a new "fast-check" pipeline in zuul so we can do some experiment if we have a second pipeline with almost identical trrigers we can propose in tree job changes and not merge them and experiment with how this might work. i can submit a patch to do that to the project-config repo but wanted to check on the ml first. again to be clear my suggestion for an experiment it to modify the gate jobs to require approval from zuul in both the check and fast check pipeline and kick off job in both pipeline in parallel so inially the check pipeline jobs would not be condtional on the fast-check pipeline jobs. the intent is to run exactly the same amount of test we do today but just to have zuul comment back in two batchs one form each pipeline. as a step two i would also be interested with merging all of the tox env jobs into one. i think that could be done by creating a new job that inherits form the base tox job and just invoke the run play book of all the tox-<env> jobs from a singel playbook. i can do experiment 2 without entirly form the in repo zuul.yaml file i think it would be interesting to do a test with "do not merge" patches to nova or placement and see how that works

...

For numbers the elastic-recheck tool [0] gives us fairly accurate tracking of which issues in the system cause tests to fail. You can use this as a starting point to potentially figure out how expensive indentation errors caught by the pep8 jobs ends up being or how often unittests fail. You probably need to tweak the queries there to get that specific though.

Periodically I also dump node resource utilization by project, repo, and job [1]. I haven't automated this because Tobiash has written a much better thing that has Zuul inject this into graphite and we should be able to set up a grafana dashboard for that in the future instead.

These numbers won't tell a whole story, but should paint a fairly accurate high level picture of the types of things we should look at to be more node efficient and "time in gate" efficient. Looking at these two really quickly myself it seems that job timeouts are a big cost (anyone looking into why our jobs timeout?).

[0] http://status.openstack.org/elastic-recheck/index.html [1] http://paste.openstack.org/show/746083/

Hope this helps, Clark

Clark Boylan

4:35 p.m.

On Tue, Feb 26, 2019, at 3:04 AM, Sean Mooney wrote:

...

On Mon, 2019-02-25 at 19:42 -0500, Clark Boylan wrote:

...
On Mon, Feb 25, 2019, at 12:51 PM, Ben Nemec wrote:

...
snip

...
That said, I wouldn't push too hard in either direction until someone crunched the numbers and figured out how much time it would have saved to not run long tests on patch sets with failing unit tests. I feel like it's probably possible to figure that out, and if so then we should do it before making any big decisions on this.

clark this sound like a interesting topic to dig into in person at the ptg/fourm. do you think we could do two things in parallel. 1 find a slot maybe in the infra track to discuss this. 2 can we create a new "fast-check" pipeline in zuul so we can do some experiment

if we have a second pipeline with almost identical trrigers we can propose in tree job changes and not merge them and experiment with how this might work. i can submit a patch to do that to the project-config repo but wanted to check on the ml first.

again to be clear my suggestion for an experiment it to modify the gate jobs to require approval from zuul in both the check and fast check pipeline and kick off job in both pipeline in parallel so inially the check pipeline jobs would not be condtional on the fast-check pipeline jobs.

Currently zuul depends on the Gerrit vote data to determine if check has been satisfied for gating requirements. Zuul's verification voting options are currently [-2,-1,0,1,2] with +/-1 for check and +/-2 for gate. Where this gets complicated is how do you resolve different values from different check pipelines, and how do you keep them from racing on updates. This type of setup likely requires a new type of pipeline in zuul that can coordinate with another pipeline to ensure accurate vote posting. Another approach may be to update zuul's reporting capabilities to report intermediate results without votes. That said, is there something that the dashboard is failing to do that this would address? At any time you should be able to check the zuul dashboard for an up to date status of your in progress jobs.

...

the intent is to run exactly the same amount of test we do today but just to have zuul comment back in two batchs one form each pipeline.

as a step two i would also be interested with merging all of the tox env jobs into one. i think that could be done by creating a new job that inherits form the base tox job and just invoke the run play book of all the tox-<env> jobs from a singel playbook.

i can do experiment 2 without entirly form the in repo zuul.yaml file

i think it would be interesting to do a test with "do not merge" patches to nova or placement and see how that works

Sean Mooney

5:03 p.m.

On Tue, 2019-02-26 at 11:35 -0500, Clark Boylan wrote:

...

On Tue, Feb 26, 2019, at 3:04 AM, Sean Mooney wrote:

...
On Mon, 2019-02-25 at 19:42 -0500, Clark Boylan wrote:

...
On Mon, Feb 25, 2019, at 12:51 PM, Ben Nemec wrote:

...
snip

...
That said, I wouldn't push too hard in either direction until someone crunched the numbers and figured out how much time it would have saved to not run long tests on patch sets with failing unit tests. I feel like it's probably possible to figure that out, and if so then we should do it before making any big decisions on this.

clark this sound like a interesting topic to dig into in person at the ptg/fourm. do you think we could do two things in parallel. 1 find a slot maybe in the infra track to discuss this. 2 can we create a new "fast-check" pipeline in zuul so we can do some experiment

if we have a second pipeline with almost identical trrigers we can propose in tree job changes and not merge them and experiment with how this might work. i can submit a patch to do that to the project-config repo but wanted to check on the ml first.

again to be clear my suggestion for an experiment it to modify the gate jobs to require approval from zuul in both the check and fast check pipeline and kick off job in both pipeline in parallel so inially the check pipeline jobs would not be condtional on the fast-check pipeline jobs.

Currently zuul depends on the Gerrit vote data to determine if check has been satisfied for gating requirements. Zuul's verification voting options are currently [-2,-1,0,1,2] with +/-1 for check and +/-2 for gate. Where this gets complicated is how do you resolve different values from different check pipelines, and how do you keep them from racing on updates. This type of setup likely requires a new type of pipeline in zuul that can coordinate with another pipeline to ensure accurate vote posting. oh right because there would only be one zuul user for both piplines so they would conflict. i had not thought about that aspect.

Another approach may be to update zuul's reporting capabilities to report intermediate results without votes. That said, is there something that the dashboard is failing to do that this would address? At any time you should be able to check the zuul dashboard for an up to date status of your in progress jobs. for me know but i find that many people dont know about zuul.openstack.org and that you can view the jobs and there logs (once a job finishes) before zuul comments back.

perhaps posting a comment when zuul starts contain a line ot zull.o.o would help the discoverability aspect.

...

...
the intent is to run exactly the same amount of test we do today but just to have zuul comment back in two batchs one form each pipeline.

as a step two i would also be interested with merging all of the tox env jobs into one. i think that could be done by creating a new job that inherits form the base tox job and just invoke the run play book of all the tox-<env> jobs from a singel playbook.

i can do experiment 2 without entirly form the in repo zuul.yaml file

i think it would be interesting to do a test with "do not merge" patches to nova or placement and see how that works

Jeremy Stanley

5:13 p.m.

On 2019-02-26 17:03:51 +0000 (+0000), Sean Mooney wrote:

...

On Tue, 2019-02-26 at 11:35 -0500, Clark Boylan wrote: [...]

...
is there something that the dashboard is failing to do that this would address? At any time you should be able to check the zuul dashboard for an up to date status of your in progress jobs.

for me know but i find that many people dont know about zuul.openstack.org and that you can view the jobs and there logs (once a job finishes) before zuul comments back.

perhaps posting a comment when zuul starts contain a line ot zull.o.o would help the discoverability aspect. [...]

We've had some semi-successful experiments in the past with exposing a filtered progress view of Zuul builds in the Gerrit WebUI. Previous attempts were stymied by the sheer volume of status API requests from hundreds of developers with dozens of open browser tabs to different Gerrit changes. Now that we've got the API better cached and separated out to its own service we may be able to weather the storm. There's also new support being worked on in Gerrit for improved CI reporting, and for which we'll hopefully be able to take advantage eventually. -- Jeremy Stanley

Sean Mooney

25 Feb 25 Feb

8:59 p.m.

...

-1 to serializing jobs with stop-on-first-failure. Human time (having to iterate fixes one failed job at a time) is more valuable than computer time. That's why we make computers. If you want quick feedback on fast-running jobs (that are running in parallel with slower-running jobs), zuul.o.o is available and easy to use. im aware of the concurn with first failure. originally i had wanted to split check into "precheck" and "check" where check would only run if

On Mon, 2019-02-25 at 14:36 -0600, Eric Fried wrote: precheck passed. after talking to people about that a few weeks ago i changed my perspetive to we should have fastcheck and check which are two piplines that run in parallel and get two comments back from zuul. so when the fast check job fishes it comment back with that set and when the check job finshses you get teh second set. gate would then require that fastcheck and check both have +1 form zuul to run.

...

If we wanted to get more efficient about our CI resources, there are other possibilities I would prefer to see tried first. For example, do we need a whole separate node to run each unit & functional job, or could we run them in parallel (or even serially, since all together they would probably still take less time than e.g. a tempest) on a single node?

currently im not sure if zull has a way to express run job 1 on a node and then run job 2 and then job 3... if it does this could certenly help. nova proably has the slowest unittests of any project because it proably has the most taking 7-8 minutes to run on a fast laptop but compared to a 1 hour and 40 ish miniut temepst run yes we could proably queue up all tox enevs on a singel vm and have it easily complete before tempest. those jobs would also benifit form sharing a pip cache on that vm as 95% of the dependencies are probaly common betwen the tox env modul the python version.

...

I would also support a commit message tag (or something) that tells zuul not to bother running CI right now. Or a way to go to zuul.o.o and yank a patch out.

Realizing of course that these suggestions come from someone who uses zuul in the most superficial way possible (like, I wouldn't know how to write a... job? playbook? with a gun to my head) so they're probably exponentially harder than using the thing Chris mentioned.

-efried

Eric Fried

26 Feb 26 Feb

12:09 a.m.

...

...
-1 to serializing jobs with stop-on-first-failure. Human time (having to iterate fixes one failed job at a time) is more valuable than computer time. That's why we make computers.

Apologies, I had nova in my head when I said this. For the placement repo specifically (at least as it stands today), running full tox locally is very fast, so you really have no excuse for pushing broken py/func. I would tentatively support stop-on-first-failure in placement only; but we should be on the lookout for a time when this tips the balance. (I hope that never happens, and I'm guessing Chris would agree with that.) -efried

Chris Dent

10:54 a.m.

On Mon, 25 Feb 2019, Eric Fried wrote:

...

...
...
-1 to serializing jobs with stop-on-first-failure. Human time (having to iterate fixes one failed job at a time) is more valuable than computer time. That's why we make computers.

Apologies, I had nova in my head when I said this. For the placement repo specifically (at least as it stands today), running full tox locally is very fast, so you really have no excuse for pushing broken py/func. I would tentatively support stop-on-first-failure in placement only; but we should be on the lookout for a time when this tips the balance. (I hope that never happens, and I'm guessing Chris would agree with that.)

I'm still not certain that we're talking about exactly the same thing. My proposal was not stop-on-first-failure. It is: 1. Run all the short duration zuul jobs, in the exact same way they run now: run each individual test, gather all individual failures, any individual test failure annotates the entire job as failed, but all tests are run, all failures are reported. If there is a failure here, zuul quits, votes -1. 2. If (only if) all those short jobs run, automatically run the long duration zuul jobs. If there is a faiulre here, zuul is done, votes -1. 3. If we reach here, zuul is still done, votes +1. This is what https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies provides. In our case we would make the grenade and tempest jobs depend on the success of (most of) the others. (I agree that if the unit and functional tests in placement ever get too slow to be no big deal to run locally, we've made an error that needs to be fixed. Similarly if placement (in isolation) gets too complex to test (and experiment with) in an easy and local fashion, we've also made an error. Plenty of projects need to be more complex than placement and require different modes for experimentation and testing. At least for now, placement does not.) -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent

Sean Mooney

11:13 a.m.

On Tue, 2019-02-26 at 10:54 +0000, Chris Dent wrote:

...

On Mon, 25 Feb 2019, Eric Fried wrote:

...
...
...
-1 to serializing jobs with stop-on-first-failure. Human time (having to iterate fixes one failed job at a time) is more valuable than computer time. That's why we make computers.

Apologies, I had nova in my head when I said this. For the placement repo specifically (at least as it stands today), running full tox locally is very fast, so you really have no excuse for pushing broken py/func. I would tentatively support stop-on-first-failure in placement only; but we should be on the lookout for a time when this tips the balance. (I hope that never happens, and I'm guessing Chris would agree with that.)

I'm still not certain that we're talking about exactly the same thing. My proposal was not stop-on-first-failure. It is:

1. Run all the short duration zuul jobs, in the exact same way they run now: run each individual test, gather all individual failures, any individual test failure annotates the entire job as failed, but all tests are run, all failures are reported. If there is a failure here, zuul quits, votes -1.

2. If (only if) all those short jobs run, automatically run the long duration zuul jobs. If there is a faiulre here, zuul is done, votes -1. ^ is where the stop on first failure comment came from. its technically not first failure but when i rasied this topic in the past there was a strong perfernece to not condtionally skip some jobs if other fail so that the developer gets as much feedback as possible. so the last sentence in the job.dependencies is the contovertial point "... and if one or more of them fail, this job will not be run." tempest jobs are the hardest set of things to run locally and people did not want to skip them for failures in things that are easy to run locally.

3. If we reach here, zuul is still done, votes +1.

This is what https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies provides. In our case we would make the grenade and tempest jobs depend on the success of (most of) the others.

(I agree that if the unit and functional tests in placement ever get too slow to be no big deal to run locally, we've made an error that needs to be fixed. Similarly if placement (in isolation) gets too complex to test (and experiment with) in an easy and local fashion, we've also made an error. Plenty of projects need to be more complex than placement and require different modes for experimentation and testing. At least for now, placement does not.)

-- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent

Ian Wienand

12:20 a.m.

On Mon, Feb 25, 2019 at 02:36:44PM -0600, Eric Fried wrote:

...

I would also support a commit message tag (or something) that tells zuul not to bother running CI right now. Or a way to go to zuul.o.o and yank a patch out.

Note that because edits to zuul jobs (i.e. whatever is in .zuul.yaml) are applied to testing for that change, for WIP changes it's usually easy to just go in and edit out any and all "unrelated" jobs while you're in early iterations [1]. Obviously you put things back when things are ready for review. I think this covers your first point. If you get it wrong, you can upload a new change and Zuul will stop active jobs and start working on the new change, which I think covers the second. -i [1] e.g. https://review.openstack.org/#/c/623137/6/.zuul.d/jobs.yaml

Bogdan Dobrelya

9:46 a.m.

New subject: [placement][TripleO] zuul job dependencies for greater good?

I attempted [0] to do that for tripleo-ci, but zuul was (and still does) complaining for some weird graphs building things :/ See also the related topic [1] from the past. [0] https://review.openstack.org/#/c/568543 [1] http://lists.openstack.org/pipermail/openstack-dev/2018-March/ 127869.html On 26.02.2019 1:20, Ian Wienand wrote:

...

On Mon, Feb 25, 2019 at 02:36:44PM -0600, Eric Fried wrote:

...
I would also support a commit message tag (or something) that tells zuul not to bother running CI right now. Or a way to go to zuul.o.o and yank a patch out.

Note that because edits to zuul jobs (i.e. whatever is in .zuul.yaml) are applied to testing for that change, for WIP changes it's usually easy to just go in and edit out any and all "unrelated" jobs while you're in early iterations [1]. Obviously you put things back when things are ready for review.

I think this covers your first point. If you get it wrong, you can upload a new change and Zuul will stop active jobs and start working on the new change, which I think covers the second.

-i

[1] e.g. https://review.openstack.org/#/c/623137/6/.zuul.d/jobs.yaml

-- Best regards, Bogdan Dobrelya, Irc #bogdando

corvus＠inaugust.com

4:53 p.m.

New subject: [placement][TripleO] zuul job dependencies for greater good?

Bogdan Dobrelya <bdobreli@redhat.com> writes:

...

I attempted [0] to do that for tripleo-ci, but zuul was (and still does) complaining for some weird graphs building things :/

See also the related topic [1] from the past.

[0] https://review.openstack.org/#/c/568543 [1] http://lists.openstack.org/pipermail/openstack-dev/2018-March/127869.html

Thank you for linking to [1]. It's worth re-reading. Especially the part at the end. -Jim

Bogdan Dobrelya

27 Feb 27 Feb

10:31 a.m.

New subject: [placement][TripleO] zuul job dependencies for greater good?

On 26.02.2019 17:53, James E. Blair wrote:

...

Bogdan Dobrelya <bdobreli@redhat.com> writes:

...
I attempted [0] to do that for tripleo-ci, but zuul was (and still does) complaining for some weird graphs building things :/

See also the related topic [1] from the past.

[0] https://review.openstack.org/#/c/568543 [1] http://lists.openstack.org/pipermail/openstack-dev/2018-March/127869.html

Thank you for linking to [1]. It's worth re-reading. Especially the part at the end.

-Jim

Yes, the part at the end is the best indeed. I'd amend the time priorities graph though like that: CPU-time < a developer time < developers time That means burning some CPU and nodes in a pool for a waste might benefit a developer, but saving some CPU and nodes in a pool would benefit *developers* in many projects as they'd get the jobs results off the waiting check queues faster :) -- Best regards, Bogdan Dobrelya, Irc #bogdando

2461

Age (days ago)

2463

Last active (days ago)

List overview

Download

17 comments

10 participants

participants (10)

Ben Nemec
Bogdan Dobrelya
Chris Dent
Clark Boylan
corvus＠inaugust.com
Eric Fried
Ian Wienand
Jeremy Stanley
Sean Mooney
Sorin Sbarnea