[openstack-dev] Introducing the NNFI scheduler for Zuul
James E. Blair
jeblair at openstack.org
Thu Sep 26 17:10:08 UTC 2013
We recently made a change to Zuul's scheduling algorithm (how it
determines which changes to combine together and run tests). Now when a
change fails tests (or has a merge conflict), Zuul will move it out of
the series of changes that it is stacking together to be tested, but it
will still keep that change's position in the queue. Jobs for changes
behind it will be restarted without the failed change in their proposed
repo states. And if something later fails ahead of it, Zuul will once
again put it back into the stream of changes it's testing and give it
another chance.
To visualize this, we've updated the status screen to include a tree
view:
http://status.openstack.org/zuul/
(If you already have that loaded, be sure to hit reload.)
In Zuul, this is called the Nearest Non-Failing Item (NNFI) algorithm
because in short, each item in a queue is at all times being tested
based on the nearest non-failing item ahead of it in the queue.
On the infrastructure side, this is going to drive our use of cloud
resources even more, as Zuul will now try to run as many jobs as it can,
continuously. Every time a change fails, all of the jobs for changes
behind it will be aborted and restarted with a new proposed future
state.
For developers, this means that changes should land faster, and more
throughput overall, as Zuul won't be waiting as long to re-test changes
after a job has failed. And that's what this is ultimately about --
virtual machines are cheap compared to developer time, so the more
velocity our automated tests can sustain, the more velocity our project
can achieve.
-Jim
(PS: There is a known problem with the status page not being able to
display the tree correctly while Zuul is in the middle of recalculating
the change graph. That should be fixed by next week, but in the mean
time, just enjoy the show.)
More information about the OpenStack-dev
mailing list