[repost with proper subscribed mail address]
Alex wrote:
The plain pageview stats are already available. Erik Zachte has been doing some work on other stats. http://stats.wikimedia.org/EN/VisitorsSampledLogRequests.htm
If I were to compile a wishlist of stats things:
- stats.grok.se data for non-Wikipedia projects 2. A better interface
for stats.wikimedia.org - There's a lot of data there, but it can be hard to find it and its not very publicized. The only reason I knew about the link above is because someone pointed it out to me once and I bookmarked it. 3. Pageview stats at http://dammit.lt/wikistats/ in files based on projects. It would be a lot easier for people at the West Flemish Wikipedia to analyze statistics themselves if they didn't have to download tons of data they don't need.
Your enhancement requests:
1 IIRC this is already a (albeit undocumented) feature. One can manually alter the url to find e.g. wiktionary stats. But I forgot precisely how and see nothing on User:Henriks talk page.
2 Seconded whole heartedly. In fact I started to reshape the main page (just eight links) this week :) I just uploaded it a bit earlier than planned: http://stats.wikimedia.org/
3 That could be a useful extension on the preservation script described below.
-------------------------------- General response
I would say since begin 2008 quite a lot has happened. A recap:
As already has been said Domas' (and Tim's) work was a major step forward.
Two very useful aggregators of these on a page by page basis are
http://stats.grok.se/ http://wikistics.falsikon.de/
Based on the same data, on a higher aggregation level there are visitors counts for all projects in a easily digestible fashion
http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
Also since two months we know much more about Wikimedia traffic based on 8 reports with all kinds of cross sections:
http://infodisiac.com/blog/2009/04/wikimedia-traffic-analyzed/
With regard to dammit.lt raw data I helped to preserve these for posterity in a more compact and slightly filtered state, so that we can query them much longer. (dammit.lt server has space for one or two months) Actually Mathias Schindler started this important rescue effort. Each day all files are downloaded and processed, reduced from 40 Gb per month to 3 Gb (May 2009). I also made a script to query these files, which is much more efficiently than processing the original hourly files. But runtime is still considerably so querying these files without restraints through a public interface is not advisable. But the toolserver could get a copy of the files of course.
http://infodisiac.com/blog/wp-content/uploads/2009/05/influenza1.png
Is this enough? Of course not, there is so much more to learn.
Considering geo data: for many months a patch for Domas' (and Tims) code has been laying around, by Antonio José Reinoso Peinado, that would add country level geolocation data from Maxmind's public database (ip->geo lookup). Although I promised to look at it, I haven't found the time yet.
Considering web bugs: comScore also proposed such a scheme to us. Apart from the question how much it would bring us that we don't or can't figure out ourselves an overriding concern is privacy.
Erik Zachte Data Analyst Wikimedia Foundation, Inc. E-Mail: ezachte@wikimedia.org
2009/6/4 Erik Zachte erikzachte@infodisiac.com:
Considering web bugs: comScore also proposed such a scheme to us. Apart from the question how much it would bring us that we don't or can't figure out ourselves an overriding concern is privacy.
So if we ran our own internal web bug mechanism, with due attention to privacy, etc - would it do anything for what you do?
- d.
Is this enough? Of course not, there is so much more to learn.
Erik Zachte
There are a few very important missing items for the moment * Number of unique visitors * Number of page visits per visitors
All should be analyzed on user roles, possibly also on different time spans (hour, day, week) and likelihood of the user being a real person or a boot. The overall numbers can then be used for analyzing the squid logs. Something like this will make it possible to make valid comparisons with several stat aggregators.
John
wikimedia-l@lists.wikimedia.org