Re: [Wikipedia-l] Mediametry survey (was )Re: Language versions' popularity vs. number of articles...)

31 Mar 2006

Brion Vibber wrote:
...
  Miguel Chaves wrote:

  Hi, I wonder if wikipedia only relies on this
sort of external statistics
 (like Alexa) to gather information about visits to the sites.
 Aren't there statistcs collected on wikipedia servers itself? This would be
 more useful and reliable.

 Not at this time. At our traffic level, web server logs are too large to handle
 comfortably without a dedicated infrastructure, and we've been forced to simply
 disable them until something easier to handle gets set up.

 (If we were an ad-supported site, such statistics would be much much more
 important and we'd have put in the time and money for it a lot sooner.)

  BTW, if we want to know the popularity of an
specific article (not a
 specific wikipedia), is there a tool for that?

 Not really, sorry.

 -- brion vibber (brion @ pobox.com)

    Since the traffic is so vast, why not use random sampling? At each page 
hit, call a random-number generator (eg read four bytes from 
/dev/urandom, or call a seeded pseudo-random number routine), and make a 
log entry only if its result == 0 mod 1000. That way, the logs will be 
statistically representative, but only require a relatively tiny amount 
of disk I/O, compute time, and disk space.

Alternatively, you could log using UDP syslog, and have a listener that 
threw away 999 out of 1000 packets.

-- Neil

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] Mediametry survey (was )Re: Language versions' popularity vs. number of articles...)