I've also contacted the researchers who partially released it, but making it publicly available is tricky for them, due to its size (12 TB), which might instead be somehow in the norms of the operations taken daily by Wikipedia servers.
Hello Mako,On Wed, Sep 24, 2014 at 8:13 AM, Benj. Mako Hill <mako@atdot.cc> wrote:> Users mostly read the most recent version of a given page, but from time to
> time, read accesses to the 'history' of a page happens.
At least as far as know, views to historical versions of webpages in
Wikipedia don't show up in the access logs at all because certain
kinds of requests (like requests to /w/index.php?oldid=NUMBER) don't
get recorded in the pageview data.I'm sorry to contradict you, but at least on the Wikibench traces, that information is very well present. I see things like:1609418296 1190438479.078 http://en.wikipedia.org/w/index.php?title=Western_betrayal&oldid=9828122&action=raw
That is, back in 2007, users were accessing a version of that page that dated back in 2005 or so.> New versions of a page are created as well. Finally, users might
> potentially need to explore several old versions of a given web
> page, for example by accessing the details of its history[1].
AFAIK, viewing the history page itself is also not recorded in the
page view data either.Sorry to contradict you again, but there are indeed logs for that as well:I'm quite surprised that such informations are not known by the community of Wikipedia researchers.Best,Valerio
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l