Lars,
You're quite right numbers are inflated, and we've been over this before [1]. Below are some sampled data for da.wiktionary from webstatscollector [2] and squid log [3] Bot traffic is a substantial share of 'page views' (but not the majority as you suggest).
We discussed this extensively in April and as I remember (my mail archive is somehow incomplete) decided to implement a second cleaned-up stream without /bot/crawler/spider/http (keeping the original stream so as not break trend lines)
However that bot free stream (projectcounts files with extra set of per wiki totals) never happened yet, and I'm pretty sure we changed plans since, and probably now wait for Kraken. Diederik can you add to this?
Cheers,
Erik
[1] On April 8, 2012 you reported a similar issue for Swedish Wikipedia. I checked by then one hour of sampled squid log. 9 out of 13 requests were bots.
[2] I just checked hourly page views reported by webstatcollector [2]: Yesterday hourly average was 619. Monthly would then be 445K. So based on this file (which feeds the report) we really seem to get this many messages. You can check at http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-02/ grep on da.d (dictionary) in projectcount files
[3] 1:1000 sampled squid log for Jan 31 has 15 lines with da.wiktionary and html. So that matches nicely with projectcounts (619*24=14856 per day or +/- 15 in 1:1000 sampled log) 9 seem legit browser requests, 4 are google bot, 1 feedfetcher
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Matthew Flaschen Sent: Wednesday, February 13, 2013 10:37 PM To: analytics@lists.wikimedia.org; lars@aronsson.se Subject: [Analytics] Fwd: [Wikitech-l] Page view stats we can believe in
I'm forwarding to the Analytics list, which is a better place to discuss this.
Matt Flaschen
-------- Original Message -------- Subject: [Wikitech-l] Page view stats we can believe in Date: Wed, 13 Feb 2013 22:18:44 +0100 From: Lars Aronsson lars@aronsson.se Reply-To: Wikimedia developers wikitech-l@lists.wikimedia.org To: Wikimedia developers wikitech-l@lists.wikimedia.org
I stumbled on the Danish Wiktionary, of all projects. Danish is the 68th biggest language of Wiktionary, and has a little more than 8,000 articles in total. Most of these articles are very short and provide no value to a reader. There is no reason to link to them, and so very unlikely that the next user should stumble upon them unless they are me.
Yet, wikistats tries to make be believe that this tiny project has 400,000 or 500,000 page views each month, and has had so for a long time, http://stats.wikimedia.org/wiktionary/EN/TablesPageViewsMonthly.htm
(I'm not talking about January 2012, which seems to have been an error, and reports 2-3 times that many views.)
My guess is that da.wiktionary has 4,000 page views per month, not 400,000. It's more likely that 400,000 is some background noise, an offset number that should be subtracted from the number of page views for any project.
If you look at the log files for just one day, you should see my IP address (85.228.something) and 3-4 other users who have been editing lately, and not many more people, but perhaps a bunch of interwiki bots.
We need an explanation to these vastly inflated page view statistics.