New subject: [Wiki-research-l] Wikipedia aggregate clickstream data released

18 Feb 2015


      We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 http://dx.doi.org/10.6084/m9.figshare.1305770
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
    • determining the most frequent links people click on for a given article
    • determining the most common links people followed to an article
    • determining how much of the total traffic to an article clicked on a link in that article
    • generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream
Ellery and Dario