If last October we got a bunch of MediaWiki developer stats thanks to the aggregation of data by Ohloh [1], now we are getting plenty more stats from Bitergia, including data from bug reporting and mailing lists:
http://blog.bitergia.com/2012/12/03/complete-basic-analysis-of-mediawiki/
Bitergia is a company based in Madrid formed by a small team of developers that have been working on FLOSS stats software for a long time. All the tools they develop are free software publicly available and open to contributions.
They have been kind enough to contribute some time and work setting up stats for the MediaWiki community. They also welcome feedback about the service and the data collected. I'm CCing Jesús M. González-Barahona, who has been my regular contact for this task in the past weeks.
Al good news for http://www.mediawiki.org/wiki/Community_Metrics !
[1] https://www.ohloh.net/orgs/wikimedia
That data is hardly useful, it doesn't explain what it refers to and, even when it does, seems wrong. Compare e.g. https://www.ohloh.net/p/mediawiki/contributors?query=&sort=commits Also, https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10&days=10000... proves they're not talking of the whole bugzilla but then they don't say which components.
Nemo
On Mon, 2012-12-03 at 19:40 +0100, Federico Leva (Nemo) wrote:
Compare e.g. https://www.ohloh.net/p/mediawiki/contributors?query=&sort=commits Also, https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10&days=10000... proves they're not talking of the whole bugzilla but then they don't say which components.
Would be helpful to mention the exact dataset you refer to.
Also I'd rather challenge weekly-bug-summary.cgi's results: MediaWiki extensions has 2031 open bugs, and only 1883 have been filed in the last 100000 days? => 148 bug reports got opened more than 274 years ago?
But maybe I fail to read weekly-bug-summary.cgi correctly.
andre
Hi Andre,
On Dec 3, 2012, at 7:51 PM, Andre Klapper aklapper@wikimedia.org wrote:
On Mon, 2012-12-03 at 19:40 +0100, Federico Leva (Nemo) wrote:
Compare e.g. https://www.ohloh.net/p/mediawiki/contributors?query=&sort=commits Also, https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10&days=10000... proves they're not talking of the whole bugzilla but then they don't say which components.
Would be helpful to mention the exact dataset you refer to.
Also I'd rather challenge weekly-bug-summary.cgi's results: MediaWiki extensions has 2031 open bugs, and only 1883 have been filed in the last 100000 days? => 148 bug reports got opened more than 274 years ago?
But maybe I fail to read weekly-bug-summary.cgi correctly.
Well, you don't. I think the UI is just misleading because the 100000 days are just automatically positioned in the table header. The script does not account for the "real age" of the bug.
The oldest bug with the number 1 has been created by Brion on Aug 10, 2004. Between this day and today are 3039 days (including today). Therefore, by replacing the number of days in the request, the same result occurs. At least in my data, the first bug has been closed on May 22, 2005 which are then 2754 days... But now it becomes complicated because there are too many changes that do not show up in my data (that are based on the bugzilla API). But you get my point :)
However, I very much like the Bitergia stats - very good first step.
andre
Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/
Claudia
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
El 03/12/12 19:40, Federico Leva (Nemo) escribió:
That data is hardly useful, it doesn't explain what it refers to
I agree a glossary of each term would be useful. It took me a while to realise that committers/closers/senders where the terms used for users of git/bugzilla/mailing list.
They should track authors instead of committers, though (preferably skipping merge commits)
Also, https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10&days=10000... proves they're not talking of the whole bugzilla but then they don't say which components.
Nemo
Looking at http://bitergia.com/public/previews/2012_11_mediawiki/data/db/acs_bicho_medi... they seem to have obtained data from bugs 1 to 19775. Not that they skipped bugs based on components.
Seems that Jesús did a fine job. It could be polished quite more with some local knowledge, merging users, hiding bots, etc. I would also change the layout of the summary page, making the graphs larger and placing the tables below. Plus some cosmetics empty brackets, missing name...
On Mon, 2012-12-03 at 19:40 +0100, Federico Leva (Nemo) wrote: [...]
Also, https://bugzilla.wikimedia.org/weekly-bug-summary.cgi?tops=10&days=10000... proves they're not talking of the whole bugzilla but then they don't say which components.
Our mining is for the MediaWiki product. In particular, the url we're using is:
https://bugzilla.wikimedia.org/buglist.cgi?product=MediaWiki
If you look in the bicho database available at http://bitergia.com/public/previews/2012_11_mediawiki/data/db/ you can count the tickets:
mysql> select count(id) from issues; +-----------+ | count(id) | +-----------+ | 19776 | +-----------+
This is consistent with the 19953 tickets that I can see right now in Bugzilla.
You're right that it is not obvious that we're only considering this product, we're fixing that.
Thanks for the advice!
Jesus.
On Mon, 2012-12-03 at 19:40 +0100, Federico Leva (Nemo) wrote:
That data is hardly useful, it doesn't explain what it refers to and, even when it does, seems wrong. Compare e.g. https://www.ohloh.net/p/mediawiki/contributors?query=&sort=commits
[...]
ok, some info about this one. It seems Ohloh is counting commits in the master branch. If you just use the git log to get the main stats:
$ git log --format=format:%ae > Authors $ grep brion Authors | wc -l 4493 $ grep tstarling Authors | wc -l 2554
which is pretty much what you see in Ohloh.
In our case, we're counting *all* the activity in the repository (all branches):
$ git log --all --format=format:%ae > Authors $ grep brion Authors | wc -l 5425 $ grep tstarling Authors | wc -l 3068
Which is pretty much our data.
To be honest, I'm not sure which one (counting only master branch, or all branches) is better: probably we should be providing both, or even a separate count for each branch, so that users may decide which data better suits their needs. I take notice about this.
Again, thanks for pointing it out.
Saludos,
Jesus.
wikitech-l@lists.wikimedia.org