Hi guys,
There is a new tool in MetricsGrimoire used to analyze the activity in MediaWiki sites:
https://github.com/MetricsGrimoire/MediaWikiAnalysis
It uses the MediaWiki API so it could be used with any MediaWiki based site (recently enough to have the needed API).
You can see it working in the Tech Community Metrics dashboard applied to mediawiki.org:
http://korma.wmflabs.org/browser/mediawiki.html
Right now it shows:
* Total page created and evolution in time of page creation. * The same for editions * The same for editors
It has incremental support so in panel is going to be updated daily. The panel does not include bot filtering yet.
Cheers
Alvaro del Castillo, 11/11/2013 17:51:
You can see it working in the Tech Community Metrics dashboard applied to mediawiki.org:
http://korma.wmflabs.org/browser/mediawiki.html
Right now it shows:
- Total page created and evolution in time of page creation.
- The same for editions
- The same for editors
Thanks! This will be interesting for the MediaWiki wikis without regular dumps. As for mediawiki.org, in your goals what is this service going to add to the main statistics http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm ? Some weirdnesses I noted: 1) Total edit counts don't match, compare e.g. Krinkle, Kgh, Jack, Jeroen: http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm#wikip... . This probably means that you are considering the wrong namespaces; please use content namespaces. 2) The number of editors is not explained. I hope it matches the metrics definitions (it seems similar) but it's not clear. 3) Number of edits per month is suspiciously low in some old months, even more than (1) would seem to justify. For instance, for October 2007 you report 28 pages (new pages, I suppose) and 275 edits (to the unknown namespaces of (1), I guess) while WikiStats says 5 new countable pages per day and 1.8 k (total) edits.
Nemo
On Mon, 2013-11-11 at 19:40 +0100, Federico Leva (Nemo) wrote:
Alvaro del Castillo, 11/11/2013 17:51:
You can see it working in the Tech Community Metrics dashboard applied to mediawiki.org:
http://korma.wmflabs.org/browser/mediawiki.html
Right now it shows:
- Total page created and evolution in time of page creation.
- The same for editions
- The same for editors
Thanks! This will be interesting for the MediaWiki wikis without regular dumps. As for mediawiki.org, in your goals what is this service going to add to the main statistics http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm ?
This is a development we're doing not in connection with our provision of metrics about MediaWiki, therefore it is not specifically aimed at improving the stats you currently have. In fact, it has been developed exactly for the scenario you mention: MediaWiki instances for which you want the get metrics via the API, not dumps.
We just thought it could be interesting to have it for WMF data too, specially if it was integrated with the rest of development activity. In addition, we're completely open to suggestions about how visualizations could be useful for you.
Some weirdnesses I noted:
[...]
Thanks a lot for these comments, they're very useful.
Saludos,
Jesus.
Jesus M. Gonzalez-Barahona, 12/11/2013 10:24:
This is a development we're doing not in connection with our provision of metrics about MediaWiki, therefore it is not specifically aimed at improving the stats you currently have. In fact, it has been developed exactly for the scenario you mention: MediaWiki instances for which you want the get metrics via the API, not dumps.
Ok. If this includes self-hosted wikis, you may want to look into https://meta.wikimedia.org/wiki/StatMediaWiki . Hm, I thought that used direct DB queries but I may misremember.
Nemo
On Tue, 2013-11-12 at 11:52 +0100, Federico Leva (Nemo) wrote:
Jesus M. Gonzalez-Barahona, 12/11/2013 10:24:
This is a development we're doing not in connection with our provision of metrics about MediaWiki, therefore it is not specifically aimed at improving the stats you currently have. In fact, it has been developed exactly for the scenario you mention: MediaWiki instances for which you want the get metrics via the API, not dumps.
Ok. If this includes self-hosted wikis, you may want to look into https://meta.wikimedia.org/wiki/StatMediaWiki . Hm, I thought that used direct DB queries but I may misremember.
Thanks! It is pretty interesting. Our approach is a bit different: retrieve information from the wiki, store it into a database, and then produce analysis, visualization, etc from the database. But the functionality of StatMediaWiki is cool, indeed (however, it seems it still needs the dump to work, which is something we try to avoid).
Saludos,
Jesus.
On mar, 2013-11-12 at 11:52 +0100, Federico Leva (Nemo) wrote:
Jesus M. Gonzalez-Barahona, 12/11/2013 10:24:
This is a development we're doing not in connection with our provision of metrics about MediaWiki, therefore it is not specifically aimed at improving the stats you currently have. In fact, it has been developed exactly for the scenario you mention: MediaWiki instances for which you want the get metrics via the API, not dumps.
Ok. If this includes self-hosted wikis, you may want to look into https://meta.wikimedia.org/wiki/StatMediaWiki . Hm, I thought that used direct DB queries but I may misremember.
Yes, it needs direct access to the database:
https://forja.rediris.es/plugins/scmsvn/viewcvs.php/trunk/help.txt?root=stat... «The system needs read-access to the MediaWiki installation database.»
Our approach is using only MediaWiki API.
But it is a pretty nice project to analyze the kind of metrics is producing.
Thanks Federico!
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Federico,
On lun, 2013-11-11 at 19:40 +0100, Federico Leva (Nemo) wrote:
Alvaro del Castillo, 11/11/2013 17:51:
You can see it working in the Tech Community Metrics dashboard applied to mediawiki.org:
http://korma.wmflabs.org/browser/mediawiki.html
Right now it shows:
- Total page created and evolution in time of page creation.
- The same for editions
- The same for editors
Thanks! This will be interesting for the MediaWiki wikis without regular dumps.
Yes, this is our target.
As for mediawiki.org, in your goals what is this service going to add to the main statistics http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm ?
I am not sure we are going to add new things, this report is pretty exhaustive, but we are creating graphs similar to the others panels in Tech Community Metrics.
We are updating also daily the data, as like the other data sources.
And we are integrating user identities in the full process with the other data sources so for example, you can have the activity for a developer in git, gerrit, bugzilla, irc, mailing lists and mediawiki.
I think we should link this report from Tech Community Metrics to get all details about statistics.
Some weirdnesses I noted:
- Total edit counts don't match, compare e.g. Krinkle, Kgh, Jack,
Jeroen: http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm#wikip... . This probably means that you are considering the wrong namespaces; please use content namespaces.
Ok! taking a look to namespaces. It is easy to filter them using the API. The key is to understand which are content namespaces. Some recommendation? I have seen that "* Talk" pages are discussions pages name spaces so we can not include them.
- The number of editors is not explained. I hope it matches the metrics
definitions (it seems similar) but it's not clear. 3) Number of edits per month is suspiciously low in some old months, even more than (1) would seem to justify.
Yes, by default when getting allpages, we are getting just the ns=0 (Default: 0). So once this is fixed, we can recheck the data.
For instance, for October 2007 you report 28 pages (new pages, I suppose) and 275 edits (to the unknown namespaces of (1), I guess) while WikiStats says 5 new countable pages per day and 1.8 k (total) edits.
Ok, we should research it. Queries are not so complex [0] but it is my first time with this API so I could be doing something wrong.
Cheers
Nemo
[0] https://github.com/MetricsGrimoire/MediaWikiAnalysis/blob/master/mediawiki_a...
Alvaro del Castillo, 12/11/2013 19:13:
Some weirdnesses I noted:
- Total edit counts don't match, compare e.g. Krinkle, Kgh, Jack,
Jeroen: http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm#wikip... . This probably means that you are considering the wrong namespaces; please use content namespaces.
Ok! taking a look to namespaces. It is easy to filter them using the API. The key is to understand which are content namespaces. Some recommendation?
See the definition: https://www.mediawiki.org/wiki/Content_namespace
Nemo
On 11/12/2013 11:38 AM, Federico Leva (Nemo) wrote:
Alvaro del Castillo, 12/11/2013 19:13:
Some weirdnesses I noted:
- Total edit counts don't match, compare e.g. Krinkle, Kgh, Jack,
Jeroen: http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm#wikip... . This probably means that you are considering the wrong namespaces; please use content namespaces.
Ok! taking a look to namespaces. It is easy to filter them using the API. The key is to understand which are content namespaces. Some recommendation?
See the definition: https://www.mediawiki.org/wiki/Content_namespace
Maybe the only namespace that would be worth filtering is User / User_talk. All the rest are equally worth in wikis of software projects (the natural scope of Metrics Grimoire and the precise example of mediawiki.org).
On mar, 2013-11-12 at 14:26 -0800, Quim Gil wrote:
On 11/12/2013 11:38 AM, Federico Leva (Nemo) wrote:
Alvaro del Castillo, 12/11/2013 19:13:
Some weirdnesses I noted:
- Total edit counts don't match, compare e.g. Krinkle, Kgh, Jack,
Jeroen: http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm#wikip... . This probably means that you are considering the wrong namespaces; please use content namespaces.
Ok! taking a look to namespaces. It is easy to filter them using the API. The key is to understand which are content namespaces. Some recommendation?
See the definition: https://www.mediawiki.org/wiki/Content_namespace
Maybe the only namespace that would be worth filtering is User / User_talk. All the rest are equally worth in wikis of software projects (the natural scope of Metrics Grimoire and the precise example of mediawiki.org).
After looking to other mediawikis it seems that by default all contents goes to namespace 0. So covering this namespace is enough in this cases.
In sites like mediawiki.org there are other content namespaces with lots of wiki pages. In mediawiki.org, namespaces 100+102 have the same pages as default namespace 0 (7K).
Taking a look to other namespaces, I am not sure if they add information about activity or could add noise to the metrics.
Quim, if you want, I can create a viz with all namespaces or add to the tool and option to use *all* namespaces available in the mediawiki based site.
Cheers
Hi!
On lun, 2013-11-11 at 19:40 +0100, Federico Leva (Nemo) wrote:
Alvaro del Castillo, 11/11/2013 17:51:
You can see it working in the Tech Community Metrics dashboard applied to mediawiki.org:
http://korma.wmflabs.org/browser/mediawiki.html
Right now it shows:
- Total page created and evolution in time of page creation.
- The same for editions
- The same for editors
Thanks! This will be interesting for the MediaWiki wikis without regular dumps. As for mediawiki.org, in your goals what is this service going to add to the main statistics http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm ? Some weirdnesses I noted:
- Total edit counts don't match, compare e.g. Krinkle, Kgh, Jack,
Jeroen: http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm#wikip... . This probably means that you are considering the wrong namespaces; please use content namespaces.
We have just published a new version using content namespaces, feature added to the tool (thanks Federico!), and the result are:
http://korma.wmflabs.org/browser/mediawiki.html
Now for example:
* Krinkle ** stats: 3366 (Until Sep 2013) ** korma: 3410 (Until yesterday) * Jeroen ** stats: 3303 ** korma (jeroendedauw): 3219 * kghbln ** stats: 2895 ** korma: 3025
So the numbers seems to be right. Jeroen is strange because in korme he has less editions than in starts, and should not be that.
- The number of editors is not explained. I hope it matches the metrics
definitions (it seems similar) but it's not clear.
Yes, it matches the metrics definition: number of different editors per month.
- Number of edits per month is suspiciously low in some old months,
even more than (1) would seem to justify. For instance, for October 2007 you report 28 pages (new pages, I suppose) and 275 edits (to the unknown namespaces of (1), I guess) while WikiStats says 5 new countable pages per day and 1.8 k (total) edits.
With the new data using all content namespaces for October 2007: 250 new pages, 2247 editions.
Thank you very much Nemo!
Nemo
Sorry, double post: the "Revisions per author" is not explained either. If reflecting the previous graphs, it's just a division of number of revisions by number of authors so I don't see how it adds any information. If the aim is to complement raw totals with some hint of the community health, you may want to graph distribution of edits data such as http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm#editd... , or something more compact like Gini factor.
Nemo