[openstack-dev] [TripleO] Review metrics - what do we want to measure?

Robert Collins robertc at robertcollins.net
Wed Sep 3 00:58:11 UTC 2014


On 14 August 2014 11:03, James Polley <jp at jamezpolley.com> wrote:
> In recent history, we've been looking each week at stats from
> http://russellbryant.net/openstack-stats/tripleo-openreviews.html to get a
> gauge on how our review pipeline is tracking.
>
> The main stats we've been tracking have been the "since the last revision
> without -1 or -2". I've included some history at [1], but the summary is
> that our 3rd quartile has slipped from 13 days to 16 days over the last 4
> weeks or so. Our 1st quartile is fairly steady lately, around 1 day (down
> from 4 a month ago) and median is unchanged around 7 days.
>
> There was lots of discussion in our last meeting about what could be causing
> this[2]. However, the thing we wanted to bring to the list for the
> discussion is:
>
> Are we tracking the right metric? Should we be looking to something else to
> tell us how well our pipeline is performing?
>
> The meeting logs have quite a few suggestions about ways we could tweak the
> existing metrics, but if we're measuring the wrong thing that's not going to
> help.
>
> I think that what we are looking for is a metric that lets us know whether
> the majority of patches are getting feedback quickly. Maybe there's some
> other metric that would give us a good indication?

If we review all patches quickly and land none, thats bad too :).

For the reviewers specifically i think we need a metric(s) that:
 - doesn't go bad when submitters go awol, don't respond etc
   - including when they come back - our stats shouldn't jump hugely
because an old review was resurrected
 - when good means submitters will be getting feedback
 - flag inventory- things we'd be happy to have landed that haven't
   - including things with a -1 from non-core reviewers (*)

(*) I often see -1's on things core wouldn't -1 due to the learning
curve involved in becoming core

So, as Ben says, I think we need to address the its-not-a-vote issue
as a priority, that has tripped us up in lots of ways

I think we need to discount -workflow patches where that was set by
the submitter, which AFAICT we don't do today.

Looking at current stats:
Longest waiting reviews (based on oldest rev without -1 or -2):

54 days, 2 hours, 41 minutes https://review.openstack.org/106167
(Keystone/LDAP integration)
That patch had a -1 on Aug 16 1:23 AM: but was quickyl turned to +2.

So this patch had a -1 then after discussion it became a +2. And its
evolved multiple times.

What should we be saying here? Clearly its had little review input
over its life, so I think its sadly accurate.

I wonder if a big chunk of our sliding quartile is just use not
reviewing the oldest reviews.

-Rob


-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list