2012/4/8 Erik Zachte ezachte@wikimedia.org
Hi Lars,
You have a point here, especially for smaller projects:
For Swedish Wikisource:
zcat sampled-1000.log-20120404.gz | grep 'GET http://sv.wikisource.org' | awk '{print $9, $11,$14}'
returns 20 lines from this 1:1000 sampled squid log file after removing javascript/json/robots.txt there are 13 left, which fits perfectly with 10,000 to 13,000 per day
however 9 of these are bots!!
How many of that 1000 sample log were robots (including all languages)?