[dev] Counting the chaff

10 Dec 2019

      It happens with some regularity that I or others are asked for code
change volume numbers on OpenStack to compare its activity level
against similar numbers published by other open-source projects, and
I pretty much always wind up needing to couch my answers with the
fact that commit counts published by projects following different
workflows aren't directly comparable. Most often, these other
projects employ an additive pull request model common to GitHub and
similar platforms, which causes many times more commits to appear in
their branch histories than happens with a rebase or squash model.
(Note that there *are* plenty of high-profile projects on GitHub
like Kubernetes using more of a rebase model, with PR owners
typically force-pushing changes to address reviewer comments or CI
results, and also projects where it's their custom to squash all
commits in a PR before merging, so in those cases our normal change
counts are more applicable for ~1:1 comparisons.)

In order to be able to provide more useful data, I decided to try
and find a way of approximating the "chaff" we discard when we merge
the "wheat" of our rebased code changes. The Gerrit API provides
information on the revision count for changes, so adding the number
of revisions for each merged change basically gives us the answer.
OpenStack's election tooling already tallies counts of changes
merged, so adding an aggregate revision counter to that was fairly
trivial: https://review.opendev.org/698291

A word on merge commits: While a direct count of Git commits will be
additionally inflated by the merge commits used to stitch changes
into a branch, and approach a 1:1 ratio with the number of changes
merged in higher-volume projects (except possibly in projects
squashing or cherry-picking/rebasing at merge time), GitHub's commit
counts explicitly exclude merge commits in order to provide more
useful metrics, therefore we don't need to factor them into our
calculations:

https://developer.github.com/v3/repos/statistics/#statistics-exclude-some-ty...

So with all that explanation out of the way (and thanks to those of
you who read this far!), it's time for some numbers. Looking at the
past five years, here's the breakdown...

    1409697 revisions / 459770 changes ~= 3

Now this was calculated using the current OpenStack governance
data so the exact change counts aren't official numbers because they
could include data from a slightly different set of Git repositories
than were official in those years, but the goal of this exercise was
to obtain a rough coefficient so I think that's a reasonable
simplification. This "GitHub coefficient" of 3 is then a number by
which OpenStack change volume can be multiplied to provide a rough
approximation for comparison with activity numbers for projects
following the common GitHub iterative PR workflow (at least for
projects which see similar levels of PR updating to OpenStack's
change revision frequency).

Out of curiosity, I also took a look at how that coefficient has
evolved over time...

    2015: 3.35
    2016: 3.12
    2017: 3.05
    2018: 2.77
    2019: 2.76

It's interesting to note that OpenStack's overall chaff-to-wheat
ratio has been trending downward. This could be due to a number of
factors. Are contributors getting better instruction and feedback to
help them understand what reviewers expect? Is our project gating
automation catching more errors and wasting less developer time on
each change? Are reviewers possibly becoming more lax and fixing
nits in subsequent changes rather than demanding more revisions to
otherwise "good enough for now" changes? Are we seeing a rise in
projects with lower affiliation diversity resulting in changes
merging with minimal or no review? It could be any or all of these.

I tried to dig back even farther to see if this pattern was
consistent. What I found, in retrospect, should perhaps not be all
that surprising...

    2011: 1.79
    2012: 2.12
    2013: 2.72
    2014: 3.18

Okay, so 2011/2012 are much lower, perhaps because contributors were
still for the most part getting used to the ideas of code review and
pre-merge testing in the first few years of the project. The data we
have in Gerrit about this earlier period is also somewhat less
reliable anyway, but it's worth pointing out that the curve for
revisions-per-change roughly follows the curve for total change
volume (peaking around 2014-2016). So maybe the answer is that when
we try to move faster we become less efficient? Additional rebases
due to a higher incidence of merge conflicts could be an explanation
there.

The patch which adds revision counting makes it possible to
calculate change revisions for deliverables of different teams as
well. These are the five teams with the highest and five with the
lowest coefficients for this year...

    12.5: I18n
    5.58: Nova
    5.33: OpenStack Helm
    5.15: Cyborg
    4.27: Storlets
    ...
    1.74: SIGs (constructed "team" encompassing all SIG-owned repos)
    1.73: Murano
    1.64: Release Management
    1.56: Puppet OpenStack
    1.56: Requirements

You can of course draw all sorts of conclusions from a comparison
like this. Clearly there's something about the I18n team's changes
which result in a very high revision count, but it's not immediately
clear to me what that is (they only have a two repositories, one of
which is basically unused and the other mostly just merges
infrequent bot-proposed changes). It's also possible that the Nova
team's reputation for being thorough reviewers is a well-deserved
one. On the other hand some teams like Release Management and
Requirements with well-structured, policy-oriented changes tend to
merge at the first or second iteration on average.

Hopefully some of you find this useful. For me, it's an interesting
way to see evidence of some of the additional work our community is
doing when it comes to making changes to its software, which isn't
necessarily evident from the basic change counts we normally see
published. It likely raises more questions than it answers, but I
think the introspection it drives is also a healthy exercise for the
community.

One final note, I've got a trimmed-down and compressed archive of
the data from which the above numbers were extracted I can forward
to anyone wants to do their own digging, just let me know if you
would like a copy. I was going to attach it to this analysis but
that would have resulted in a >300KB message to the mailing list, so
I thought better of that plan.
-- 
Jeremy Stanley