Open Stack

Fri Apr 5 19:18:13 UTC 2013

Hi Eric,

On 04/05/2013 08:31 PM, Eric Windisch wrote:
>
>
> On Friday, April 5, 2013 at 14:17 PM, Stefano Maffulli wrote:
>
>> Let me pull in the authors of the study, as they may be able to shed
>> some light on the inconsistencies you found.
>>
>> Eric, Joshua: can you please send Daniel and Jesus more details so they
>> can look into them?
>
> I made a note on the blog. The response to others indicates that their 
> results are based on two different methodologies (git-dm and their own 
> dataset analysis), this would likely be the source of differences in 
> numbers.  I haven't noticed variations anywhere except author counts, 
> but I haven't looked very hard, either.

The methodology we have used to match developers and affiliations is 
based on information partially obtained from the OpenStack gitdm 
project, but also compared to our own dataset (that we already had from 
previous releases). Sorry if I didn't explain myself consistently in the 
blog.

The bug here is related to how we're calculating data for the 
spreadsheets and company by company. The result is that the company by 
company analysis had a bug, and we were counting some more developers 
than expected and commits (we were counting for instance as two 
different people a developer who used at some point two different email 
addresses).

So, the data at the tables (bottom part in the main page) is the correct 
one. The data for the source code management system in the left part of 
each of the companies is overestimated.

In addition, the number of commits in Rackspace will be a bit higher for 
the next round. Another developer told us that he moved from one company 
to Rackspace at some point, so you will see how that number will 
increased a bit.

>
> I guess it could also be differences or errors in employee->company 
> mappings? Perhaps instead, one methodology includes those that report 
> bugs, while the other only accounts for git? I'm not sure.

Regarding to this point, the data about bug tracking system and mailing 
lists is only based on activity from developers. This means that people 
that have not committed a change to the source code are not counted as 
part of the activity of companies in Launchpad and Mailing Lists. In any 
case and as an example, we're covering around a 60% of the activity in 
the mailing lists because people that at some point submitted changes to 
the Git are that active.

Our purpose with this is to show only activity from developers and their 
affiliations through the three data sources (git, tickets and mailing 
lists). This is also an option. From our point of view this analysis was 
pretty interesting, but perhaps for others this is not good enough.

>
> Other things like dividing commits/authors seems to just be the wrong 
> methodology where a median would be more appropriate and harder to game.
>

This is a good point. As you mention it is probably more fair to have 
such metric. At some point we would like to show some boxplots and other 
metrics to better understand the distribution of the datasets, but we 
had to choose some. In any case, we will take into account this for the 
next reports for sure. Thanks!

Probably a good approach would be to have a common view with all of the 
people interested in this type of analysis. In this way we could reach 
an agreement about how to visualize data, necessary and interesting 
metrics, common methodology to measure stuff and projects involved. This 
analysis is just a possibility, but there are some more for sure.

In any case, please, let us know any other concerns you may have and any 
feedback of the community is more than appreciated.

Thanks a lot for all your comments.

Regards,
Daniel Izquierdo.

> Regards,
> Eric Windisch
>

-- 
|\_____/| Daniel Izquierdo Cortázar
  [o] [o]  dizquierdo at bitergia.com - Ph.D
  |  V  |  http://www.bitergia.com
   |   |
-ooo-ooo-

Open Stack

[openstack-dev] Grizzly's out - let the numbers begin...

OpenStack

Community

Documentation

Branding & Legal