On Tue, 2019-02-26 at 10:54 +0000, Chris Dent wrote:
On Mon, 25 Feb 2019, Eric Fried wrote:
-1 to serializing jobs with stop-on-first-failure. Human time (having to iterate fixes one failed job at a time) is more valuable than computer time. That's why we make computers.
Apologies, I had nova in my head when I said this. For the placement repo specifically (at least as it stands today), running full tox locally is very fast, so you really have no excuse for pushing broken py/func. I would tentatively support stop-on-first-failure in placement only; but we should be on the lookout for a time when this tips the balance. (I hope that never happens, and I'm guessing Chris would agree with that.)
I'm still not certain that we're talking about exactly the same thing. My proposal was not stop-on-first-failure. It is:
1. Run all the short duration zuul jobs, in the exact same way they run now: run each individual test, gather all individual failures, any individual test failure annotates the entire job as failed, but all tests are run, all failures are reported. If there is a failure here, zuul quits, votes -1.
2. If (only if) all those short jobs run, automatically run the long duration zuul jobs. If there is a faiulre here, zuul is done, votes -1. ^ is where the stop on first failure comment came from. its technically not first failure but when i rasied this topic in the past there was a strong perfernece to not condtionally skip some jobs if other fail so that the developer gets as much feedback as possible. so the last sentence in the job.dependencies is the contovertial point "... and if one or more of them fail, this job will not be run." tempest jobs are the hardest set of things to run locally and people did not want to skip them for failures in things that are easy to run locally.
3. If we reach here, zuul is still done, votes +1.
This is what https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies provides. In our case we would make the grenade and tempest jobs depend on the success of (most of) the others.
(I agree that if the unit and functional tests in placement ever get too slow to be no big deal to run locally, we've made an error that needs to be fixed. Similarly if placement (in isolation) gets too complex to test (and experiment with) in an easy and local fashion, we've also made an error. Plenty of projects need to be more complex than placement and require different modes for experimentation and testing. At least for now, placement does not.)
-- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent