[openstack-dev] [nova] A modest proposal to reduce reviewer load

Matthew Booth mbooth at redhat.com
Thu Jun 19 12:42:02 UTC 2014

On 19/06/14 13:22, Mark McLoughlin wrote:
> On Thu, 2014-06-19 at 09:34 +0100, Matthew Booth wrote:
>> On 19/06/14 08:32, Mark McLoughlin wrote:
>>> Hi Armando,
>>> On Tue, 2014-06-17 at 14:51 +0200, Armando M. wrote:
>>>> I wonder what the turnaround of trivial patches actually is, I bet you
>>>> it's very very small, and as Daniel said, the human burden is rather
>>>> minimal (I would be more concerned about slowing them down in the
>>>> gate, but I digress).
>>>> I think that introducing a two-tier level for patch approval can only
>>>> mitigate the problem, but I wonder if we'd need to go a lot further,
>>>> and rather figure out a way to borrow concepts from queueing theory so
>>>> that they can be applied in the context of Gerrit. For instance
>>>> Little's law [1] says:
>>>> "The long-term average number of customers (in this context reviews)
>>>> in a stable system L is equal to the long-term average effective
>>>> arrival rate, λ, multiplied by the average time a customer spends in
>>>> the system, W; or expressed algebraically: L = λW."
>>>> L can be used to determine the number of core reviewers that a project
>>>> will need at any given time, in order to meet a certain arrival rate
>>>> and average time spent in the queue. If the number of core reviewers
>>>> is a lot less than L then that core team is understaffed and will need
>>>> to increase.
>>>> If we figured out how to model and measure Gerrit as a queuing system,
>>>> then we could improve its performance a lot more effectively; for
>>>> instance, this idea of privileging trivial patches over longer patches
>>>> has roots in a popular scheduling policy [3] for  M/G/1 queues, but
>>>> that does not really help aging of 'longer service time' patches and
>>>> does not have a preemption mechanism built-in to avoid starvation. 
>>>> Just a crazy opinion...
>>>> Armando
>>>> [1] - http://en.wikipedia.org/wiki/Little's_law
>>>> [2] - http://en.wikipedia.org/wiki/Shortest_job_first
>>>> [3] - http://en.wikipedia.org/wiki/M/G/1_queue
>>> This isn't crazy at all. We do have a problem that surely could be
>>> studied and solved/improved by applying queueing theory or lessons from
>>> fields like lean manufacturing. Right now, we're simply applying our
>>> intuition and the little I've read about these sorts of problems is that
>>> your intuition can easily take you down the wrong path.
>>> There's a bunch of things that occur just glancing through those
>>> articles:
>>>   - Do we have an unstable system? Would it be useful to have arrival 
>>>     and exit rate metrics to help highlight this? Over what time period 
>>>     would those rates need to be averaged to be useful? Daily, weekly, 
>>>     monthly, an entire release cycle?
>>>   - What are we trying to optimize for? The length of time in the 
>>>     queue? The number of patches waiting in the queue? The response 
>>>     time to a new patch revision?
>>>   - We have a single queue, with a bunch of service nodes with a wide 
>>>     variance between their service rates, very little in the way of
>>>     scheduling policy, a huge rate of service nodes sending jobs back 
>>>     for rework, a cost associated with maintaining a job while it sits 
>>>     in the queue, the tendency for some jobs to disrupt many other jobs 
>>>     with merge conflicts ... not simple.
>>>   - Is there any sort of natural limit in our queue size that makes the 
>>>     system stable - e.g. do people naturally just stop submitting
>>>     patches at some point?
>>> My intuition on all of this lately is that we need some way to model and
>>> experiment with this queue, and I think we could make some interesting
>>> progress if we could turn it into a queueing network rather than a
>>> single, extremely complex queue.
>>> Say we had a front-end for gerrit which tracked which queue a patch is
>>> in, we could experiment with things like:
>>>   - a triage queue, with non-cores signed up as triagers looking for 
>>>     obvious mistakes and choosing the next queue for a patch to enter 
>>>     into
>>>   - queues having a small number of cores signed up as owners - e.g. 
>>>     high priority bugfix, API, scheduler, object conversion, libvirt
>>>     driver, vmware driver, etc.
>>>   - we'd allow for a large number of queues so that cores could aim for 
>>>     an "inbox zero" approach on individual queues, something that would 
>>>     probably help keep cores motivated.
>>>   - we could apply different scheduling policies to each of the 
>>>     different queues - i.e. explicit guidance for cores about which 
>>>     patches they should pick off the queue next.
>>>   - we could track metrics on individual queues as well as the whole 
>>>     network, identifying bottlenecks and properly recognizing which 
>>>     reviewers are doing a small number of difficult reviews versus 
>>>     those doing a high number of trivial reviews.
>>>   - we could require some queues to feed into a final approval queue 
>>>     where some people are responsible for giving an approved patch a 
>>>     final sanity check - i.e. there would be a class of reviewer with 
>>>     good instincts who quickly churn through already-reviewed patches 
>>>     looking for the kind of mistakes people tend to mistake when 
>>>     they're down in the weeds.
>>>   - explicit queues for large, cross-cutting changes like coding style 
>>>     changes. Perhaps we could stop servicing these queues at certain
>>>     points in the cycles, or reduce the rate at which they are 
>>>     serviced.
>>>   - we could include specs and client patches in the same network so 
>>>     that they prioritized in the same way.
>>> Lots of ideas, none of it is trivial ... but perhaps it'll spark
>>> someone's interest :)
>> This is all good stuff, but by the sounds of it experimenting in gerrit
>> isn't likely to be simple.
> Right, which is why I said "a front-end for gerrit" ... this would sit
> in front of gerrit, at least to begin with.
>> Remember, though, that the relevant metric is code quality, not review
>> rate.
> Not sure what conclusion you're driving towards there - obviously that's
> true, but ... ignore review rate? measure code quality how?

I mean that there are other ways to improve code quality than review. If
we find ourselves putting a lot of effort into this, we may be better
served by looking for low hanging fruit elsewhere.


Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

More information about the OpenStack-dev mailing list