[openstack-dev] [nova] Risk prediction model for OpenStack

Matt Riedemann mriedemos at gmail.com
Wed Apr 5 15:33:21 UTC 2017


On 4/5/2017 9:00 AM, Jeremy Stanley wrote:
> On 2017-04-05 14:00:59 +0800 (+0800), 林泽燕 wrote:
> [...]
>> I wonder if I could show you my study, including some metrics for
>> the prediction model and a visualization tool.
> [...]
>
> I want to start out thanking you for your research and interest in
> OpenStack's development practices. I love that our contribution
> model enables such scientific analysis, a sometimes less recognized
> benefit of our community's choice to work entirely in the open. This
> specific study is also very insightful and well-presented.
>
>> In this release, 36 developers left the development of this file
>> (they made contributions in last release but not this one).
>> Developers leaving a code file deprive the file of the knowledge
>> of the decisions they have made.
> [...]
>
> One potentially influential aspect of our development model is that
> we place a heavy importance on code review. For any patch to make it
> into a branch under official revision control, it must first be
> reviewed by multiple experienced, long-standing contributors to that
> repository. Our hope is that even though some developers may cease
> contributing new patches to a file, some of them would still be
> reviewing, guiding and refining changes proposed by newer
> contributors. It doesn't seem like this behavior was captured in
> your analysis, or alternatively the fact that your model yielded
> relatively accurate predictions could imply that our review process
> has little impact on defects introduced by new commits.
>
> If you do at some point wish to try integrating review metrics into
> your analysis, our code review system has a REST API you can
> leverage, and much of the data you'd likely be interested in can be
> queried via anonymous methods such that you wouldn't even need to
> create an account. Documentation for the interface is available at
> https://review.openstack.org/Documentation/rest-api.html and we also
> have documentation of our general developer workflow at
> https://docs.openstack.org/infra/manual/developers.html as well as
> some background on our development model at
> https://docs.openstack.org/project-team-guide/open-development.html
> if that helps.
>

Jeremy pointed out what I was going to mention, which was the lack of 
input on code reviews. Each major component of Nova, or virt drivers, 
generally have subteams, or some sort of subject domain expert, that is 
consulted or at least involved in reviewing code contributions. So while 
they may not be making the changes themselves to a component, they 
should be reviewing those changes. For example, with the 
nova/virt/libvirt/driver.py, danpb was the main core reviewer and 
maintainer for that code in the past, so while he didn't write 
everything, he was reviewing a lot of the contributions.

Some of the files are also skewed a bit, and you might want to take into 
account logic paths in a module to exclude it. For example, exception.py 
and the various opts.py modules are outliers. They are basically files 
that contain constants but not logic code so the chance of those having 
an actual owner is small, but so should be the risk for bugs. They will 
also have a high diversity given how common they are.

I'm not sure I understood the timeline graphs, or the point those are 
making. We definitely have an ebb and flow of contributions based on the 
release schedule where feature development and new code is loaded toward 
the front of the release, and then that is supposed to be cut off toward 
the 3rd milestone at the end of the release so we can stabilize and 
focus on bugs.

In general some of this is common sense. When one person "owns" most of 
a module in a piece of software they are the expert and therefore bugs 
due to lack of understanding the bigger picture of that module, or how 
it fits into the bigger system, should be mitigated. When that person 
leaves, if others on the team don't have the domain knowledge, there are 
going to be mistakes. We definitely have parts of the nova codebase that 
fall into areas that we know are just very touchy and error prone and we 
avoid changing those if at all possible (block device mappings, quotas, 
neutronv2.api, nova-network and cells v1 come to mind). This is hard in 
a big open source project, but is also why we have high standards for 
core reviewers (those that can approve code contributions) and a 
ridiculous amount of continuous integration testing.

-- 

Thanks,

Matt



More information about the OpenStack-dev mailing list