2012/4/8 Erik Zachte <ezachte(a)wikimedia.org>
Hi Lars,
You have a point here, especially for smaller projects:
For Swedish Wikisource:
zcat sampled-1000.log-20120404.gz | grep 'GET
http://sv.wikisource.org' |
awk '{print $9, $11,$14}'
returns 20 lines from this 1:1000 sampled squid log file
after removing javascript/json/robots.txt there are 13 left,
which fits perfectly with 10,000 to 13,000 per day
however 9 of these are bots!!
How many of that 1000 sample log were robots (including all languages)?
--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website:
https://sites.google.com/site/emijrp/