It happens with some regularity that I or others are asked for code change volume numbers on OpenStack to compare its activity level against similar numbers published by other open-source projects, and I pretty much always wind up needing to couch my answers with the fact that commit counts published by projects following different workflows aren't directly comparable. Most often, these other projects employ an additive pull request model common to GitHub and similar platforms, which causes many times more commits to appear in their branch histories than happens with a rebase or squash model. (Note that there *are* plenty of high-profile projects on GitHub like Kubernetes using more of a rebase model, with PR owners typically force-pushing changes to address reviewer comments or CI results, and also projects where it's their custom to squash all commits in a PR before merging, so in those cases our normal change counts are more applicable for ~1:1 comparisons.) In order to be able to provide more useful data, I decided to try and find a way of approximating the "chaff" we discard when we merge the "wheat" of our rebased code changes. The Gerrit API provides information on the revision count for changes, so adding the number of revisions for each merged change basically gives us the answer. OpenStack's election tooling already tallies counts of changes merged, so adding an aggregate revision counter to that was fairly trivial: https://review.opendev.org/698291 A word on merge commits: While a direct count of Git commits will be additionally inflated by the merge commits used to stitch changes into a branch, and approach a 1:1 ratio with the number of changes merged in higher-volume projects (except possibly in projects squashing or cherry-picking/rebasing at merge time), GitHub's commit counts explicitly exclude merge commits in order to provide more useful metrics, therefore we don't need to factor them into our calculations: https://developer.github.com/v3/repos/statistics/#statistics-exclude-some-ty... So with all that explanation out of the way (and thanks to those of you who read this far!), it's time for some numbers. Looking at the past five years, here's the breakdown... 1409697 revisions / 459770 changes ~= 3 Now this was calculated using the current OpenStack governance data so the exact change counts aren't official numbers because they could include data from a slightly different set of Git repositories than were official in those years, but the goal of this exercise was to obtain a rough coefficient so I think that's a reasonable simplification. This "GitHub coefficient" of 3 is then a number by which OpenStack change volume can be multiplied to provide a rough approximation for comparison with activity numbers for projects following the common GitHub iterative PR workflow (at least for projects which see similar levels of PR updating to OpenStack's change revision frequency). Out of curiosity, I also took a look at how that coefficient has evolved over time... 2015: 3.35 2016: 3.12 2017: 3.05 2018: 2.77 2019: 2.76 It's interesting to note that OpenStack's overall chaff-to-wheat ratio has been trending downward. This could be due to a number of factors. Are contributors getting better instruction and feedback to help them understand what reviewers expect? Is our project gating automation catching more errors and wasting less developer time on each change? Are reviewers possibly becoming more lax and fixing nits in subsequent changes rather than demanding more revisions to otherwise "good enough for now" changes? Are we seeing a rise in projects with lower affiliation diversity resulting in changes merging with minimal or no review? It could be any or all of these. I tried to dig back even farther to see if this pattern was consistent. What I found, in retrospect, should perhaps not be all that surprising... 2011: 1.79 2012: 2.12 2013: 2.72 2014: 3.18 Okay, so 2011/2012 are much lower, perhaps because contributors were still for the most part getting used to the ideas of code review and pre-merge testing in the first few years of the project. The data we have in Gerrit about this earlier period is also somewhat less reliable anyway, but it's worth pointing out that the curve for revisions-per-change roughly follows the curve for total change volume (peaking around 2014-2016). So maybe the answer is that when we try to move faster we become less efficient? Additional rebases due to a higher incidence of merge conflicts could be an explanation there. The patch which adds revision counting makes it possible to calculate change revisions for deliverables of different teams as well. These are the five teams with the highest and five with the lowest coefficients for this year... 12.5: I18n 5.58: Nova 5.33: OpenStack Helm 5.15: Cyborg 4.27: Storlets ... 1.74: SIGs (constructed "team" encompassing all SIG-owned repos) 1.73: Murano 1.64: Release Management 1.56: Puppet OpenStack 1.56: Requirements You can of course draw all sorts of conclusions from a comparison like this. Clearly there's something about the I18n team's changes which result in a very high revision count, but it's not immediately clear to me what that is (they only have a two repositories, one of which is basically unused and the other mostly just merges infrequent bot-proposed changes). It's also possible that the Nova team's reputation for being thorough reviewers is a well-deserved one. On the other hand some teams like Release Management and Requirements with well-structured, policy-oriented changes tend to merge at the first or second iteration on average. Hopefully some of you find this useful. For me, it's an interesting way to see evidence of some of the additional work our community is doing when it comes to making changes to its software, which isn't necessarily evident from the basic change counts we normally see published. It likely raises more questions than it answers, but I think the introspection it drives is also a healthy exercise for the community. One final note, I've got a trimmed-down and compressed archive of the data from which the above numbers were extracted I can forward to anyone wants to do their own digging, just let me know if you would like a copy. I was going to attach it to this analysis but that would have resulted in a >300KB message to the mailing list, so I thought better of that plan. -- Jeremy Stanley