sorry, I hit submit too fast.
The *clickstream dataset* contains data from individual page requests (extracted from the referral, when available, of any single page requested). The *navigation vector* data Leila referred to measures visits to pages that co-occur within a browser session.
There is extensive documentation on each dataset on the corresponding Meta pages as well as notebooks http://ewulczyn.github.io/Wikipedia_Clickstream_Getting_Started/ that the author of the dataset (Ellery) produced which should help get you started analyzing this data.
Hope this helps! Dario
On Sat, Aug 27, 2016 at 1:16 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
The closest open dataset to what you are referring to is the clickstream dataset:
https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream https://dx.doi.org/10.6084/m9.figshare.1305770
On Fri, Aug 26, 2016 at 2:38 PM, Leila Zia leila@wikimedia.org wrote:
On Fri, Aug 26, 2016 at 1:38 AM, Federico Leva (Nemo) <nemowiki@gmail.com
wrote:
Jan Dittrich, 26/08/2016 10:03:
or even click paths
Do you know about https://meta.wikimedia.org/wik i/Research:Improving_link_coverage/Release_page_traces ?
and https://meta.wikimedia.org/wiki/Research:Wikipedia_Navigatio n_Vectors ?
Leila
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
*Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter http://twitter.com/readermeter