[openstack-dev] Grizzly's out - let the numbers begin...

Matt Joyce matt.joyce at cloudscaling.com
Fri Apr 5 21:12:17 UTC 2013


Made a quick attempt to guess gender of devs to see a gender bias in
development.

https://github.com/openfly/openstack-rnd/tree/master/openstack-gender

Came up with

females : 25
males : 262
unknowncount : 240

the gender import came from

https://github.com/Bemmu/gender-from-name

which uses US Census Data ( obviously going to have a bias there probably
why I have 240 unknowns...  )
If someone knows of a better dataset to test against I'd love to hear it.

Alternativelly.... git push your name and gender to the gender.py   =D

-Matt

On Fri, Apr 5, 2013 at 12:18 PM, Daniel Izquierdo
<dizquierdo at bitergia.com>wrote:

> Hi Eric,
>
>
> On 04/05/2013 08:31 PM, Eric Windisch wrote:
>
>>
>>
>> On Friday, April 5, 2013 at 14:17 PM, Stefano Maffulli wrote:
>>
>>  Let me pull in the authors of the study, as they may be able to shed
>>> some light on the inconsistencies you found.
>>>
>>> Eric, Joshua: can you please send Daniel and Jesus more details so they
>>> can look into them?
>>>
>>
>> I made a note on the blog. The response to others indicates that their
>> results are based on two different methodologies (git-dm and their own
>> dataset analysis), this would likely be the source of differences in
>> numbers.  I haven't noticed variations anywhere except author counts, but I
>> haven't looked very hard, either.
>>
>
> The methodology we have used to match developers and affiliations is based
> on information partially obtained from the OpenStack gitdm project, but
> also compared to our own dataset (that we already had from previous
> releases). Sorry if I didn't explain myself consistently in the blog.
>
> The bug here is related to how we're calculating data for the spreadsheets
> and company by company. The result is that the company by company analysis
> had a bug, and we were counting some more developers than expected and
> commits (we were counting for instance as two different people a developer
> who used at some point two different email addresses).
>
> So, the data at the tables (bottom part in the main page) is the correct
> one. The data for the source code management system in the left part of
> each of the companies is overestimated.
>
> In addition, the number of commits in Rackspace will be a bit higher for
> the next round. Another developer told us that he moved from one company to
> Rackspace at some point, so you will see how that number will increased a
> bit.
>
>
>
>> I guess it could also be differences or errors in employee->company
>> mappings? Perhaps instead, one methodology includes those that report bugs,
>> while the other only accounts for git? I'm not sure.
>>
>
> Regarding to this point, the data about bug tracking system and mailing
> lists is only based on activity from developers. This means that people
> that have not committed a change to the source code are not counted as part
> of the activity of companies in Launchpad and Mailing Lists. In any case
> and as an example, we're covering around a 60% of the activity in the
> mailing lists because people that at some point submitted changes to the
> Git are that active.
>
> Our purpose with this is to show only activity from developers and their
> affiliations through the three data sources (git, tickets and mailing
> lists). This is also an option. From our point of view this analysis was
> pretty interesting, but perhaps for others this is not good enough.
>
>
>
>> Other things like dividing commits/authors seems to just be the wrong
>> methodology where a median would be more appropriate and harder to game.
>>
>>
> This is a good point. As you mention it is probably more fair to have such
> metric. At some point we would like to show some boxplots and other metrics
> to better understand the distribution of the datasets, but we had to choose
> some. In any case, we will take into account this for the next reports for
> sure. Thanks!
>
> Probably a good approach would be to have a common view with all of the
> people interested in this type of analysis. In this way we could reach an
> agreement about how to visualize data, necessary and interesting metrics,
> common methodology to measure stuff and projects involved. This analysis is
> just a possibility, but there are some more for sure.
>
> In any case, please, let us know any other concerns you may have and any
> feedback of the community is more than appreciated.
>
> Thanks a lot for all your comments.
>
>
> Regards,
> Daniel Izquierdo.
>
>  Regards,
>> Eric Windisch
>>
>>
>
> --
> |\_____/| Daniel Izquierdo Cortázar
>  [o] [o]  dizquierdo at bitergia.com - Ph.D
>  |  V  |  http://www.bitergia.com
>   |   |
> -ooo-ooo-
>
>
>
> ______________________________**_________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.**org <OpenStack-dev at lists.openstack.org>
> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-dev<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130405/a2b2034b/attachment.html>


More information about the OpenStack-dev mailing list