[openstack-dev] [Metrics] Improving the data about contributor/affiliation/time
Stefano Maffulli
stefano at openstack.org
Thu Oct 17 21:34:14 UTC 2013
hello folks
first of all: congratulations to all developers, testers, users,
translators, tech writers for the new release: Havana is out of the gate
with impressive numbers.
Speaking of numbers, a lot of you have noticed mistakes in the reported
numbers, from misspelling of names to missing/wrong company
affiliations. With my apologies for the mistakes comes an explanation of
where I see things fail and a suggestion on how to fix this for the future.
Currently there are three places where statistics about the project are
released:
- OpenStack Activity Board http://activity.openstack.org/
- gitdm http://git.openstack.org/cgit/openstack-infra/gitdm/
- Stackalytics http://git.openstack.org/cgit/stackforge/stackalytics/
Activity Board is actually made of two pieces: the Dash and Insights.
Insights pulls straight from the OpenStack Foundation Members db
http://www.openstack.org/community/members/, so what you see in personal
pages like
http://activity.openstack.org/data/plugins/zfacts/view.action?instance=Person,person3986c85a-b9af-4686-8c7b-45525f62e396
is exactly what is written on Robert's personal profile
http://www.openstack.org/community/members/profile/3619 (these
confluence pages are updated daily).
The data about companies on the Dash are the result of semi-automatic
processing and cleanup of the data from OpenStack Foundation Members db.
The cleanup is necessary because a) one can't always rely on people
spelling correctly the name of their company b) the Profile pages lack
the UI to properly track the history of affiliation [1]. Here is what
the Dash looks like for Canonical:
http://activity.openstack.org/dash/releases/company.html?company=Canonical
gitdm and Stackalytics take their developer/company/time tuples from
files maintained by developers themselves compensated by heuristics to
'guess' affiliations from things like email addresses in the commit logs.
Four sources of data for this reporting is bad and not sustainable.
Since it seems commonly accepted that all developers need to be members
of the Foundation, and that Foundation members need to state their
affiliation when they join and keep such data current when it changes, I
think the Foundation is in a good place to provide the authoritative
data for all projects to use.
We can make things easier by making the personal profile pages more
useful so people login more often and improve quality of data. Fixing
the known shortcomings mentioned above is one step. Furthermore, we're
working to develop an OpenID provider based on the Members DB that will
be used across all our web properties (from gerrit to the upcoming
groups.openstack.org, etc) so those profile will be used for more than
just for the initial signup to be a member [2].
Since nobody can rely on user input we will still have to 'cleanup' the
data as it comes in from the Members DB in order to create a 'Master
Data Record' that we can export for all to consume. Here things get a
bit fuzzy because currently the Members DB has an API that is not
designed to be securely consumed publicly[3].
What I think we can do is to have a periodic job pulling the full list
of members and their stated affiliation, and run on that an
automatic/manual cleanup/sanitizing job that creates files/tables ready
to be consumed by all projects.
What do you think? I'm interested in gathering more ideas and lay down a
plan to fix this issue.
thanks,
stef
[1] To improve problem A the system suggests proper spelling when you
start typing. For problem B there is a fix coming to the site.
[2] I'll send more details about this project soon
https://blueprints.launchpad.net/openstack-ci/+spec/sso-openid-provider
[3] The Members DB is tightly connected to the web site openstack.org.
There is an effort to move the whole site under openstack-infra/ so this
pain poing will be removed soon, hopefully.
PS did you look at the numbers?
http://www.openstack.org/software/havana/
http://blog.bitergia.com/2013/10/17/the-openstack-havana-release/
--
Ask and answer questions on https://ask.openstack.org
More information about the OpenStack-dev
mailing list