* MZMcBride wrote:
I've been asked a few times recently about doing reports of the most-viewed pages per month/per day/per year/etc. A few years after Domas first started publishing this information in raw form, the current situation seems rather bleak. Henrik has a visualization tool with a very simple JSON API behind it (http://stats.grok.se), but other than that, I don't know of any efforts to put this data into a database.
When making http://katograph.appspot.com/ which renders the german Wiki- pedia category system as an interactive "treemap" based on information like number of articles in them and requests during a 3 day period, I found that the proxy logs used for stats.grok.se are rather unreliable, with many of the "top" pages being inplausible (articles on not very notable subjects that have existed only for a very short time show up in the top ten, for instance). On http://stats.grok.se/en/top you can see this aswell, 40 million views for `Special:Export/Robert L. Bradley, Jr` is rather implausible, as far as human users are concerned.
Is it worth a Toolserver user's time to try to create a database of per-project, per-page page view statistics? Is it worth a grant from the Wikimedia Foundation to have someone work on this? Is it worth trying to convince Wikimedia Deutschland to assign resources? And, of course, it wouldn't be a bad idea if Domas' first-pass implementation was improved on Wikimedia's side, regardless.
The data that powers stats.grok.se is available for download, it should be rather trivial to feed it into toolserver databases and query it as desired, ignoring performance problems. But short of believing that in December 2010 "User Datagram Protocol" was more interesting to people than Julian Assange you would need some other data source to make good statistics. http://stats.grok.se/de/201009/Ngai.cc would be another ex- ample.