Thanks for bringing this up! I don't have any answers, but there's a feature I'd like to build on this dataset. I wonder if bringing this stuff into a more readily available database could be part of that project in some way.
Basically, I'd like to publish per-editor pageview stats. That is, Mediawiki would keep track of the number of times an article had been viewed since the first day you edited it, and let you know how many times your edits had been seen (approximately, depending on the resolution of the data). I think such personalized stats could really help to drive editor retention. The information is available now through Henrik's tool, but even if you know about stats.grok.se, it's hard to keep track and make the connection between the graphs there and one's own contributions.
Clearly, pageview data of at least daily resolution would be required to make such a thing work.
Are there other specific projects that require this data? It will be much easier to make a case for accelerating development of the dataset if there are some clear examples of where it's needed, and especially if it can help to meet the current editor retention goals.
-Ian
On Thu, Aug 11, 2011 at 3:12 PM, MZMcBride z@mzmcbride.com wrote:
Hi.
I've been asked a few times recently about doing reports of the most-viewed pages per month/per day/per year/etc. A few years after Domas first started publishing this information in raw form, the current situation seems rather bleak. Henrik has a visualization tool with a very simple JSON API behind it (http://stats.grok.se), but other than that, I don't know of any efforts to put this data into a database.
Currently, if you want data on, for example, every article on the English Wikipedia, you'd have to make 3.7 million individual HTTP requests to Henrik's tool. At one per second, you're looking at over a month's worth of continuous fetching. This is obviously not practical.
A lot of people were waiting on Wikimedia's Open Web Analytics work to come to fruition, but it seems that has been indefinitely put on hold. (Is that right?)
Is it worth a Toolserver user's time to try to create a database of per-project, per-page page view statistics? Is it worth a grant from the Wikimedia Foundation to have someone work on this? Is it worth trying to convince Wikimedia Deutschland to assign resources? And, of course, it wouldn't be a bad idea if Domas' first-pass implementation was improved on Wikimedia's side, regardless.
Thoughts and comments welcome on this. There's a lot of desire to have a usable system.
MZMcBride
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l