Hello, Does the English Wikipedia currently track pageviews?
I'm doing a study looking at the page ratings, and how that is (or isn't) affected by a reader's understanding of the discussion process that went on behind the scenes. We'd really like to be able to know if the rater saw the talk page before they rated the article. As secondary goals, we'd like to see if they edited the article and/or talk page, and as a tertiary goal, we'd like to measure how familiar they are with Wikipedia and talk pages in general (e. g. do they even know Talk pages exist, are they a frequent discussant on them, etc.). If it is possible to get the information about ratings and pageviews (esp. common fields/links between them), can somebody guide me on how to? If the data is currently not collected but there is a way to start doing so (i. e. no philosophical objection or significant tech/performance issue b/c of the caching layers), who's the right person to work with for that?
Thanks!
Grace and peace, Ben
-- W. Ben Towne wbt+wiki@cs.cmu.edu Computation, Organizations, & Society http://www.cos.cs.cmu.edu/ Carnegie Mellon University
Date: Tue, 12 Jul 2011 21:51:56 -0700 From: Dario Taraborellidtaraborelli@wikimedia.org Subject: [Wiki-research-l] New data dumps available To: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Message-ID:461E101F-0005-443C-96F8-7382E8463BD4@wikimedia.org Content-Type: text/plain; charset="windows-1252"
As part of its product development program, the Wikimedia Foundation's Tech Department will be releasing regular data dumps for all the features that are currently being implemented. The first weekly dumps from the Article Feedback Tool ? an experimental feature to engage readers to interact with Wikipedia's contents via a quality rating system [1] ? are available since this afternoon [2]. The latest datasets contain raw ratings data collected each week from a random sample of 100K articles of the English Wikipedia. More datasets will be released in the coming weeks as we deploy new features.
Over the summer a new series of datasets produced by the participants in the Wikimedia Summer of Research [3] will be released and an open data repository will be announced to host and permanently identify these datasets. Further details on this program and WMF's open data policy will follow on the Foundation's blog and on this list.
Dario
[1]http://www.mediawiki.org/wiki/Article_feedback [2]http://www.mediawiki.org/wiki/Article_feedback/Data [3]http://meta.wikimedia.org/wiki/Research:Wikimedia_Summer_of_Research_2011
-- Dario Taraborelli, PhD Senior Research Analyst Wikimedia Foundation
http://wikimediafoundation.org http://nitens.org/taraborelli
Hi Ben,
If you are interested in "pageviews", the best available public resource is:
[http://dumps.wikimedia.org/other/pagecounts-raw/]
which provides an aggregate count of views for a page, by hour (and I have a parser to store all this to a MySQLDB if it interests you). However, this does not map views to a particular identifier (username or IP address) or an exacting time-stamp, as you seem to desire. This might be tough because:
* The WMF treats the IP addresses of registered editors as confidential information. IP address is used for "unregistered" editing. Regardless, no data pertaining to simple access is available in a public-facing fashion to my knowledge (and if it were, it would be trivial to determine the IP addresses of registered editors)
* Assuming you were allowed to view it, even for an hour's time, the apache-like log of en:wp access would be LARGE. Consider that the terse and aggregate format they make available is already on the order of ~80MB/hour zipped.
I am not terribly familiar with the article ratings tool and its operation, but I assume it would incur the same privacy concerns. Ratings data does seem to be accessible via the API:
[http://en.wikipedia.org/w/api.php]
But there are no fields describing the user/IP that left that feedback.
-----
Of course, I speak only of publicly available data. If you are able to convince the administration to collect and confidentially share this data, it would become more feasible (although you'd be trying to trace user click-paths from a -ton- of data).
Its not my intention to discourage you, but have you thought about looking at this in a more aggregate fashion (i.e., average daily talk-page views vs. article quality rating)? -AW
On 02/07/2012 03:03 PM, W. Ben Towne wrote:
Hello, Does the English Wikipedia currently track pageviews?
I'm doing a study looking at the page ratings, and how that is (or isn't) affected by a reader's understanding of the discussion process that went on behind the scenes. We'd really like to be able to know if the rater saw the talk page before they rated the article. As secondary goals, we'd like to see if they edited the article and/or talk page, and as a tertiary goal, we'd like to measure how familiar they are with Wikipedia and talk pages in general (e. g. do they even know Talk pages exist, are they a frequent discussant on them, etc.). If it is possible to get the information about ratings and pageviews (esp. common fields/links between them), can somebody guide me on how to? If the data is currently not collected but there is a way to start doing so (i. e. no philosophical objection or significant tech/performance issue b/c of the caching layers), who's the right person to work with for that?
Thanks!
Grace and peace, Ben
wiki-research-l@lists.wikimedia.org