On 2018-12-09 23:14:37 +0900 (+0900), Ghanshyam Mann wrote: [...]
We can optimize the node by removing the job from running queue on the first failure hit instead of full run and then release the node. This is a trade-off with getting the all failure once and fix them all together but I am not sure if that is the case all time. For example- if any change has pep8 error then, no need to run integration tests jobs there. This at least can save nodes at some extent.
I can recall plenty of times where I've pushed a change which failed pep8 on some non-semantic whitespace complaint and also had unit test or integration test failures. In those cases it's quite obvious that the pep8 failure reason couldn't have been the reason for the other failed jobs so seeing them all saved me wasting time on additional patches and waiting for more rounds of results. For that matter, a lot of my time as a developer (or even as a reviewer) is saved by seeing which clusters of jobs fail for a given change. For example, if I see all unit test jobs fail but integration test jobs pass I can quickly infer that there may be issues with a unit test that's being modified and spend less time fumbling around in the dark with various logs. It's possible we can save some CI resource consumption with such a trade-off, but doing so comes at the expense of developer and reviewer time so we have to make sure it's worthwhile. There was a point in the past where we did something similar (only run other jobs if a canary linter job passed), and there are good reasons why we didn't continue it. -- Jeremy Stanley