Hi Andrew,

Thanks for the aggregate pageview count data link. I may eventually be interested in the MySQL parser and will follow up off-list if that later becomes needed. 
I don't necessarily need an IP address for registered users; the username (or IP if anon. user) would be sufficient to link view and rating.  Ideally I'd transform the data into a few bits about viewed talk page/edited talk page/edited article, and maybe some overall measures of Wikipedia experience level, with the article feedback ratings that user left. 
>Its not my intention to discourage you, but have you thought about 
>looking at this in a more aggregate fashion (i.e., average daily 
>talk-page views vs. article quality rating)? -AW
I would expect any correlation there to be "explained away" when accounting for article-page views; the relevant metric there would probably be a ratio between article and talk page views.  I expect that ratio to be fairly uniform, perhaps a Gaussian distribution with relatively low variance, especially when partitioned by assessment class or "controversial" status.  If there is any significant correlation between the ratio and the article quality ratings, that would be interesting but still not enough to make causal claims about how exposure to the discussion or the type of discussion impacts perceived article quality.
We'll see what we can do with the information available- and if anybody has other ideas, please reply!

Grace and peace,
Ben

--
W. Ben Towne
wbt+wiki@cs.cmu.edu
Computation, Organizations, & Society
Carnegie Mellon University

Date: Tue, 07 Feb 2012 15:43:39 -0500
From: "Andrew G. West" <westand@cis.upenn.edu>
To: Research into Wikimedia content and communities
	<wiki-research-l@lists.wikimedia.org>
Subject: Re: [Wiki-research-l] Pageviews and ratings
Message-ID: <4F318CFB.7010505@cis.upenn.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Ben,

If you are interested in "pageviews", the best available public resource is:

[http://dumps.wikimedia.org/other/pagecounts-raw/]

which provides an aggregate count of views for a page, by hour (and I 
have a parser to store all this to a MySQLDB if it interests you). 
However, this does not map views to a particular identifier (username or 
IP address) or an exacting time-stamp, as you seem to desire. This might 
be tough because:

* The WMF treats the IP addresses of registered editors as confidential 
information. IP address is used for "unregistered" editing. Regardless, 
no data pertaining to simple access is available in a public-facing 
fashion to my knowledge (and if it were, it would be trivial to 
determine the IP addresses of registered editors)

* Assuming you were allowed to view it, even for an hour's time, the 
apache-like log of en:wp access would be LARGE. Consider that the terse 
and aggregate format they make available is already on the order of 
~80MB/hour zipped.

I am not terribly familiar with the article ratings tool and its 
operation, but I assume it would incur the same privacy concerns. 
Ratings data does seem to be accessible via the API:

[http://en.wikipedia.org/w/api.php]

But there are no fields describing the user/IP that left that feedback.

-----

Of course, I speak only of publicly available data. If you are able to 
convince the administration to collect and confidentially share this 
data, it would become more feasible (although you'd be trying to trace 
user click-paths from a -ton- of data).

Its not my intention to discourage you, but have you thought about 
looking at this in a more aggregate fashion (i.e., average daily 
talk-page views vs. article quality rating)? -AW