We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 http://dx.doi.org/10.6084/m9.figshare.1305770
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream
Ellery and Dario
Congratulations. Great effort. This opens the door for a lot of research based on Wikipedia.
with regards
Ditty
On Wed, Feb 18, 2015 at 12:30 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770
This dataset contains counts of *(referer, article) *pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million *(referer, article)* pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream
Ellery and Dario
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi all,
For archive happiness:
Clickstream dataset is now being generated on a monthly basis for 5 Wikipedia languages (English, Russian, German, Spanish, and Japanese). You can access the data at https://dumps.wikimedia.org/other/clickstream/ and read more about the release and those who contributed to it at https://blog.wikimedia.org/2018/01/16/wikipedia-rabbit-hole-clickstream/
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Tue, Feb 17, 2015 at 11:00 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770
This dataset contains counts of *(referer, article) *pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million *(referer, article)* pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream
Ellery and Dario
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Just wanted to quickly thank Dario et al. for releasing these data that is Gold! And also (self)-promote a paper that we wrote based the earlier releases of the data to be presented at CompleNet'18 in Boston (March)
Inspiration, Captivation, and Misdirection: Emergent Properties in Networks of Online Navigation https://arxiv.org/abs/1710.03326P Gildersleve, T Yasseri - arXiv preprint arXiv:1710.03326, 2017 - arxiv.org
Best Taha
On Tue, Jan 16, 2018 at 7:21 PM, Leila Zia leila@wikimedia.org wrote:
Hi all,
For archive happiness:
Clickstream dataset is now being generated on a monthly basis for 5 Wikipedia languages (English, Russian, German, Spanish, and Japanese). You can access the data at https://dumps.wikimedia.org/other/clickstream/ and read more about the release and those who contributed to it at https://blog.wikimedia.org/2018/01/16/wikipedia-rabbit-hole-clickstream/
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Tue, Feb 17, 2015 at 11:00 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770
This dataset contains counts of *(referer, article) *pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures
22
million *(referer, article)* pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream
Ellery and Dario
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l