Re: [Wiki-research-l] Access to HTTP access logs for Wikipedia articles?

26 Apr 2010

Anthony wrote:
...
  On Thu, Apr 22, 2010 at 6:31 PM, Platonides
&lt;Platonides(a)gmail.com
 <mailto:Platonides@gmail.com>> wrote:

     S. Nunes wrote:
  Hi all,

 I presume that Wikipedia keeps data about HTTP accesses to all      articles.
  Can anybody inform me if this data is available
for research purposes?  
     No. With the amount of traffic it has, space needs would be immense, and
     Wikimedia is not interested in logging all accesses.

 What kind of space needs are we talking about?   
100k requests per second.
Assuming that an url is 50 bytes on average, that's 432 GB per day (the
usual apache log line is about 1.5 times that).
Most requests are handled by the squids so the backing servers are not
even aware of them. Tim Starling had to made a patch to squid in order
to register the articles accessed (ie. the data at domas wikistats).

...
  I find it hard to imagine that the other top 10
websites aren't keeping 
 this information. They probably store it aggregated and/or just a sample.

...
  Shouldn't you be logging every access, at least
for a few days, in case
 of some sort of security breach? 
You would need to
a) Detect that there is a security breach.
b) Find what produced the security breach in that log.

...
      What if your referer was your facebook personal
page leaking your full
     real name?

 And what if you're in the sample?  I find it quite inappropriate that
 even sampled data like this is being released. 
The referer is not stored anywhere.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Access to HTTP access logs for Wikipedia articles?