Date: Tue, 07 Feb 2012 15:43:39 -0500
From: "Andrew G. West" <westand@cis.upenn.edu>
To: Research into Wikimedia content and communities
<wiki-research-l@lists.wikimedia.org>
Subject: Re: [Wiki-research-l] Pageviews and ratings
Message-ID: <4F318CFB.7010505@cis.upenn.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Ben,
If you are interested in "pageviews", the best available public resource is:
[http://dumps.wikimedia.org/other/pagecounts-raw/]
which provides an aggregate count of views for a page, by hour (and I
have a parser to store all this to a MySQLDB if it interests you).
However, this does not map views to a particular identifier (username or
IP address) or an exacting time-stamp, as you seem to desire. This might
be tough because:
* The WMF treats the IP addresses of registered editors as confidential
information. IP address is used for "unregistered" editing. Regardless,
no data pertaining to simple access is available in a public-facing
fashion to my knowledge (and if it were, it would be trivial to
determine the IP addresses of registered editors)
* Assuming you were allowed to view it, even for an hour's time, the
apache-like log of en:wp access would be LARGE. Consider that the terse
and aggregate format they make available is already on the order of
~80MB/hour zipped.
I am not terribly familiar with the article ratings tool and its
operation, but I assume it would incur the same privacy concerns.
Ratings data does seem to be accessible via the API:
[http://en.wikipedia.org/w/api.php]
But there are no fields describing the user/IP that left that feedback.
-----
Of course, I speak only of publicly available data. If you are able to
convince the administration to collect and confidentially share this
data, it would become more feasible (although you'd be trying to trace
user click-paths from a -ton- of data).
Its not my intention to discourage you, but have you thought about
looking at this in a more aggregate fashion (i.e., average daily
talk-page views vs. article quality rating)? -AW