Hi Kevin,<div><br></div><div>I believe that the code ownership can reflect the module size in some way. The code ownership is small, which means a lot of commits made by a group of contributors, and thus the module might be quite large.</div><div>And I think the amount of bugs might be used to identify the risky files for developers and managers and could act as an indicator of workload needed to improve the quality of the file, thus our develop team can estimate the workload on each file and adjust the work priority.</div><div>I wonder if I have made it clear. Thank you for your attention.</div><div><br></div><div>Zoey Lin<br><br><br><blockquote name="replyContent" class="ReferenceQuote" style="padding-left:5px;margin-left:5px;border-left:#b6b6b6 2px solid;margin-right:0">-----原始邮件-----<br>
<b>发件人:</b><span id="rc_from">"Kevin Benton" <kevin@benton.pub></span><br>
<b>发送时间:</b><span id="rc_senttime">2017-04-05 22:17:08 (星期三)</span><br>
<b>收件人:</b> "OpenStack Development Mailing List (not for usage questions)" <openstack-dev@lists.openstack.org><br>
<b>抄送:</b> <br>
<b>主题:</b> Re: [openstack-dev] [neutron] Risk prediction model for OpenStack<br><br><div dir="ltr">Thanks for this analysis. So one thing that jumps out at me right away is the correlation of this with the module size. ovs_neutron_agent.py is one of the biggest modules (if not the biggest non-test module) in Neutron, so if you don't control for line count in the analysis, this one would come out on top even if it had the same code quality (bugs per line) as other modules. How do you deal with module size?<div><br></div><div>Cheers,</div><div>Kevin Benton</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Apr 4, 2017 at 11:01 PM, 林泽燕 <span dir="ltr"><<a href="mailto:linzeyan@pku.edu.cn" target="_blank">linzeyan@pku.edu.cn</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">Dear everyone, <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">My name is Zoey Lin, majored in
Computer Science, Peking University, China. I’m a candidate of Master Degree.
Recently I'm making <span style="color:#70ad47">a
research on OpenStack about the contribution composition of a code file, to
predict the potential amount of defect that the file would have in the later
development stage of a release.</span><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">I wonder if I could show you my
study, including some metrics for the prediction model and a visualization
tool. I would appreciate it if you could share your opinions or give some
advices, which would really, really help me a lot. Thank you so much for your
kindness. :)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times"> </span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">First of all, I would give a brief
introduction to my study. I analyzed and designed some metrics to describe the
contribution composition of a code file, and then use these metrics to train a
model to predict the amount of bug that a file would have as a risk value in
the later development stage of a release, using the historical commit log data
of the former development stage. The model showed a good performance. I also
developed a tool to visualize the metrics and the potential risk value, which
we believe could help developers and project managers being aware of the
current situation and risk of the code file. We expect that project managers
could estimate the workload, adjust the personnel assignment and locate the
development problems based on the information of the tool and finally could
reduce the risk and improve the quality of the project efficiently in some way.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times"> </span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">Then, I would introduce two main
metrics of my model.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">1. code ownership of files and
developers: <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">Code ownership shows the number of
engineers contributing to a source code artifact and the relative proportion of
their contributions. The code ownership of a file refers to the proportion of
ownership for the contributor with the highest proportion of ownership, which
could indicate that whether there is one developer who “owns” the file and has
a high level of expertise, who can act as a single point of contact for others
who need to use the component, need changes to it, or just have questions about
it. Minor developer refers to developer whose code ownership is lower than 5%.
Previous literatures have proven that code ownership and amount of minor
developer strongly correlate with code defects.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">2. contribution diversity of the
file:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">We measured the uncertainty in a
code file's contributions (or the diversity of sources of contributions) in a
given month using the Teachman/Shannon entropy index, a commonly used diversity
measure in many scientific disciplines.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">H(x) = E[I(xi)] = E[
log(2,1/p(xi)) ] = -∑p(xi)log(2,p(xi)) (i=1,2,..n),<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">p(xi) is the code ownership of
developer xi, I(xi) means the information we need to judge if a contribution
belongs to developer xi. H(x) ranges between 0, when all the contribution of
the file belong to one developer in a release, and log(2, N), when N developers
contribute equally (i.e., pi = 1/N) to the file. The larger H(x) is, the more
diverse the contribution of the file is.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">We assume that the more diverse
the contribution, the more bugs the code file would have in this release. And
We have proven that there is a significant positive correlation between the
contribution diversity and the amount of defect of the file.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times"> </span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">Next, I would give some intuitive
displays and analysis about these metrics. We assume that it is Feb.2016 now.<u></u><u></u></span></p><p class="MsoNormal"><font face="Times"> <img src="cid:61e473da$1$15b41495548$Coremail$linzeyan$pku.edu.cn"></font></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times">This is part of the Release Page.
We offer some basic information and the potential amount of bug we predict
about the active code files in this release. <span style="color:#4472c4">The pie shows us the risk distribution of the potential
risky files, which could help managers estimating the following workload and
decide which files to mainly focus on.</span><u></u><u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times"><u></u> <u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times">Then comes the File page. We take
the file neutron/plugins/ml2/drivers/<wbr>openvswitch/agent/ovs_neutron_<wbr>agent.py as
an example.<u></u><u></u></wbr></wbr></span></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times">The model predicts that the file
would have 44 defects in the following development stage of the release (and we
check the real data, there are 40 defects in fact).<u></u><u></u></span></p><p class="MsoNormal"><font face="Times"> <img src="cid:5d721f6b$2$15b41495548$Coremail$linzeyan$pku.edu.cn"></font></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times">First, let's see how the
contribution diversity and file code ownership developed in the past releases
in the charts below. We can find that when the code ownership is small, which
means there is not a developer who “own” the file, and the contribution
diversity is large, which means the sources of the code of the file are diverse
and the composition of code is complex, the amount of defect in that release
would be large (for example, Liberty). In contrast, when the code ownership
increase and the contribution diversity decrease (for example, Kilo), the
amount of defect would be smaller. It shows that the amount of bug is affected
by the code ownership and contribution diversity in some way, which have also
been proven by my study.<u></u><u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times">And now, (shown in the green box
above) the code ownership is 0.11 (quite small), and contribution diversity is
5.07 (quite large), we can predict that the amount of defect would be large,
like Liberty.<u></u><u></u></span></p><p class="MsoNormal"><font face="Times"> <img src="cid:6f6cd9ba$3$15b41495548$Coremail$linzeyan$pku.edu.cn"></font></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times">After being aware of the potential
risk of the file, we try to offer some information to help locate the problems.<u></u><u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times">1. In this release, we had just 3
major developers and 36 minor developers, and even <span style="color:#4472c4">the code ownership of major developer is small (0.11),
it might result in the fact that, no one can guide the development of the file,
and act as a point of contact for others. And large amount of minor developer
might increase the contribution diversity. These would all result in high risk
of defects.</span><u></u><u></u></span></p><p class="MsoNormal"><font face="Times"> <img src="cid:534a7b7c$4$15b41495548$Coremail$linzeyan$pku.edu.cn"></font></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times"><u></u> <img src="cid:3ef1349c$5$15b41495548$Coremail$linzeyan$pku.edu.cn"><u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times"><u></u> <u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times">2. We can see from the charts in
the purple box above. In this release, 24 developers left the development of
this file (they made contributions in last release but not this one). <span style="color:#4472c4">Developers leaving a code file
deprive the file of the knowledge of the decisions they have made.</span>
Previous research shows that the <span style="color:#4472c4">survivors and newcomers maintaining abandoned code have reduced
productivity and are more likely to make mistakes</span>. And the file had 26 <span style="color:#4472c4">newcomers, who might not be
familiar with the design and framework of the code file, the new contributions
they made might conflict with the others and thus might bring defects.
Therefore, I think we should pay attention to these contributions to reduce the
risk of defects.</span><u></u><u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-family:Times">3. We can also see how the
developers' contribution composition develops by month. In these charts, we can
<span style="color:#4472c4">specific which month had
unreasonable work distribution</span>, and then check the code contribution in
that month. In this release, we can suppose that the contribution made in
Nov.2015 might bring some defects, and we should <span style="color:#4472c4">pay attention to these contributions to reduce the
defect risk.</span><u></u><u></u></span></p><p class="MsoNormal">
</p><p class="MsoNormal"><font face="Times"> <img src="cid:4db50e0c$6$15b41495548$Coremail$linzeyan$pku.edu.cn"></font></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times"> </span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">Ok, that are some examples of the
information we could get from the visualization tool. And I hope to know what
you think about them on the following three questions, which would give me
great help on my research:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times"> </span></p>
<p class="m_-7298554172513027697MsoListParagraph" style="margin-left:18.0pt"><span lang="EN-US" style="font-family:Times">1.<span style="font-variant-numeric:normal;font-stretch:normal;font-size:7pt;line-height:normal;font-family:"Times New Roman"">
</span></span><span lang="EN-US" style="font-family:Times">Do you
think the metrics information I offered in the tool are useful for developers
and project managers in some way? <u></u><u></u></span></p>
<p class="m_-7298554172513027697MsoListParagraph" style="margin-left:18.0pt;text-indent:0cm"><span lang="EN-US" style="font-family:Times">In particular, could the code ownership be
used to identify the experts of the file and how would it help in practice?</span><span lang="EN-US"> </span><span lang="EN-US" style="font-family:Times">And do you think that files with high
code ownership would result in higher code quality and fewer failures?<u></u><u></u></span></p>
<p class="m_-7298554172513027697MsoListParagraph" style="margin-left:18.0pt;text-indent:0cm"><span lang="EN-US" style="font-family:Times">Do you think the contribution diversity
could act as an indicator for high risk of lower code quality of the file in
some way and why? And what would it mean in practice when the contribution
diversity of a file changes a lot?<u></u><u></u></span></p>
<p class="m_-7298554172513027697MsoListParagraph" style="margin-left:18.0pt;text-indent:0cm"><span lang="EN-US" style="font-family:Times">Do you agree that when contributors left the
project, their code would be hard to be maintained by others, and contributions
made by newcomers would be more likely to bring bugs to the files? So would it
help by knowing how many people left the project and how many people are
newcomers to the projects and who are them? If yes, how would it help in
practice?<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times"> </span></p>
<p class="m_-7298554172513027697MsoListParagraph" style="margin-left:18.0pt"><span lang="EN-US" style="font-family:Times">2.<span style="font-variant-numeric:normal;font-stretch:normal;font-size:7pt;line-height:normal;font-family:"Times New Roman"">
</span></span><span lang="EN-US" style="font-family:Times">Do you
think the analysis I made about the example (paragraph in blue font) make
sense? And what else information you can get from the charts and how can they
help? Or what else information you expect to get from the visualization tool?<u></u><u></u></span></p>
<p class="m_-7298554172513027697MsoListParagraph" style="margin-left:18.0pt;text-indent:0cm"><span lang="EN-US" style="font-family:Times"> </span></p>
<p class="m_-7298554172513027697MsoListParagraph" style="margin-left:18.0pt"><span lang="EN-US" style="font-family:Times">3.<span style="font-variant-numeric:normal;font-stretch:normal;font-size:7pt;line-height:normal;font-family:"Times New Roman"">
</span></span><span lang="EN-US" style="font-family:Times">Do you
think it make sense to predict the potential amount of bug that need to fix in
the later development stage of a release by analyzing the metrics of the former
development stage of the release?<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times"> </span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">Again, I would appreciate it a lot
if you could share your opinions. And thank you so much for your time.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Times">Looking forward to your reply.
Wish you all have a good day.<u></u><u></u></span></p>
<br><br><span><br><div style="color:rgb(51,51,51);font-family:verdana;font-size:12px"><font color="#0000ff" face="Microsoft Yahei"><span style="font-size:16px;line-height:normal">Best regards!</span></font></div><div style="color:rgb(51,51,51);font-family:verdana;font-size:12px"><span style="font-family:"Microsoft Yahei";font-size:16px;color:rgb(0,0,255)"><span style="line-height:normal">———————</span></span><span style="color:rgb(0,0,255);font-family:"Microsoft Yahei";line-height:normal;font-size:16px">———</span></div><div style="color:rgb(51,51,51);font-family:verdana;font-size:12px"><span style="font-family:"Microsoft Yahei";font-size:16px;color:rgb(0,0,255)"><span style="line-height:normal">Zeyan Lin</span><br style="color:rgb(136,136,136);font-family:Simsun;line-height:normal;font-size:medium"><span style="line-height:normal">Department of Computer Science</span><br style="color:rgb(136,136,136);font-family:Simsun;line-height:normal;font-size:medium"><span style="line-height:normal">School of Electronics Engineering & Computer Science</span><br style="color:rgb(136,136,136);font-family:Simsun;line-height:normal;font-size:medium"><span style="line-height:normal">Peking University</span><br style="color:rgb(136,136,136);font-family:Simsun;line-height:normal;font-size:medium"><span style="line-height:normal">Beijing 100871, China</span><br style="color:rgb(136,136,136);font-family:Simsun;line-height:normal;font-size:medium"><span style="line-height:normal">E-mail:<a href="mailto:linzeyan@pku.edu.cn" target="_blank">linzeyan@pku.edu.cn</a></span></span></div></span><br>______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</wbr></wbr></a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</wbr></wbr></a><br>
<br></wbr></wbr></blockquote></div><br></div>
</blockquote><br><span><br><div style="color: rgb(51, 51, 51); font-family: verdana; font-size: 12px;"><font color="#0000ff" face="Microsoft Yahei"><span style="font-size: 16px; line-height: normal;">Best regards!</span></font></div><div style="color: rgb(51, 51, 51); font-family: verdana; font-size: 12px;"><span style="font-family: "Microsoft Yahei"; font-size: 16px; color: rgb(0, 0, 255);"><span style="line-height: normal;">———————</span></span><span style="color: rgb(0, 0, 255); font-family: "Microsoft Yahei"; line-height: normal; font-size: 16px;">———</span></div><div style="color: rgb(51, 51, 51); font-family: verdana; font-size: 12px;"><span style="font-family: "Microsoft Yahei"; font-size: 16px; color: rgb(0, 0, 255);"><span style="line-height: normal;">Zeyan Lin</span><br style="color: rgb(136, 136, 136); font-family: Simsun; line-height: normal; font-size: medium;"><span style="line-height: normal;">Department of Computer Science</span><br style="color: rgb(136, 136, 136); font-family: Simsun; line-height: normal; font-size: medium;"><span style="line-height: normal;">School of Electronics Engineering & Computer Science</span><br style="color: rgb(136, 136, 136); font-family: Simsun; line-height: normal; font-size: medium;"><span style="line-height: normal;">Peking University</span><br style="color: rgb(136, 136, 136); font-family: Simsun; line-height: normal; font-size: medium;"><span style="line-height: normal;">Beijing 100871, China</span><br style="color: rgb(136, 136, 136); font-family: Simsun; line-height: normal; font-size: medium;"><span style="line-height: normal;">E-mail:linzeyan@pku.edu.cn</span></span></div></span></div>