On Tue, Jan 16, 2018 at 10:38 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Do I understand well that the 3% of "other" links are the ones that have articles at *this *time but they did not exist at the time of the dump. So in effect they are not red links?
Per description of "Other" in https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Format, the lines in the data that are labeled with Other are those where both referrer and request articles exist in Wikipedia at the time of creating the dumps, however, the referrer article does not link to the requested article. This can happen, for example, if the user does an internal search and get to the requested article from the referrer page.
The question about redlinks is a separate one, I think, and it goes back to your question at https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream#Not_foun... . Dario or others closer to the data will be able to comment on whether it's included in these recurring releases.
Is there any way to find the articles people were seeking but could not find??
If redlinks are included, part of this question can be addressed by this dataset, but not all.
Best, Leila
Thanks, GerardM
On 16 January 2018 at 20:21, Leila Zia leila@wikimedia.org wrote:
Hi all,
For archive happiness:
Clickstream dataset is now being generated on a monthly basis for 5 Wikipedia languages (English, Russian, German, Spanish, and Japanese). You can access the data at https://dumps.wikimedia.org/other/clickstream/ and read more about the release and those who contributed to it at https://blog.wikimedia.org/2018/01/16/wikipedia-rabbit-hole-clickstream/
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Tue, Feb 17, 2015 at 11:00 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770
This dataset contains counts of *(referer, article) *pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures
22
million *(referer, article)* pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream
Ellery and Dario
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l