I stumbled on the Danish Wiktionary, of all projects. Danish is the 68th biggest language of Wiktionary, and has a little more than 8,000 articles in total. Most of these articles are very short and provide no value to a reader. There is no reason to link to them, and so very unlikely that the next user should stumble upon them unless they are me.
Yet, wikistats tries to make be believe that this tiny project has 400,000 or 500,000 page views each month, and has had so for a long time, http://stats.wikimedia.org/wiktionary/EN/TablesPageViewsMonthly.htm
(I'm not talking about January 2012, which seems to have been an error, and reports 2-3 times that many views.)
My guess is that da.wiktionary has 4,000 page views per month, not 400,000. It's more likely that 400,000 is some background noise, an offset number that should be subtracted from the number of page views for any project.
If you look at the log files for just one day, you should see my IP address (85.228.something) and 3-4 other users who have been editing lately, and not many more people, but perhaps a bunch of interwiki bots.
We need an explanation to these vastly inflated page view statistics.
acording to http://stats.grok.se/da.d/latest90/mandag has been viewed 127 times in the last 3 months, and ranks on 927. the raw pagecount files are here: http://dumps.wikimedia.org/other/pagecounts-raw/
i then took an arbitrary file and looked into it, at midnight, i guess UTC, feb 1st. as all projects are in this file, lets grep for danish wiktionary, "da.d " at the beginning of the line:
grep '^da.d\s' pagecounts-20130201-000000 | wc 569 2276 19572
this means 569 pages accessed in this hour, at least once. so lets sort by third column, which is the page accesses. largest access are at the bottom, so lets take the last 20 lines:
grep '^da.d\s' pagecounts-20130201-000000 | sort -k3n,3 | tail -20 da.d pony 2 30008 da.d skak 2 44151 da.d Speciel:Eksporter/engelsk 2 7818 da.d Speciel:Eksporter/hyle 2 4630 da.d Speciel:Eksporter/krog 2 4632 da.d Speciel:Eksporter/skaml%C3%A6ber 2 4632 da.d Forside 3 96050 da.d horse 3 54974 da.d interessant 3 9339 da.d Speciel:Eksporter/arrang%C3%B8rer 3 6948 da.d Speciel:Eksporter/b%C3%B8ger 3 6948 da.d Speciel:Eksporter/forg%C3%A6ves 3 6946 da.d Speciel:Eksporter/hensigtsm%C3%A6ssig 3 6946 da.d Speciel:Eksporter/hvad 3 9900 da.d Speciel:Eksporter/indvendig 3 6948 da.d Speciel:Eksporter/k%C3%A6le 3 6948 da.d Speciel:Eksporter/monogame 3 6944 da.d Speciel:Eksporter/revet 3 6946 da.d Speciel:Eksporter/topstykke 3 6944 da.d springer 3 45292
this means that e.g. "springer" was supposedly accessed 3 times in that hour. the article does not exist, but there is a red link out of http://da.wiktionary.org/wiki/Wiktionary:Top_10000_(Dansk).
rupert.
On Wed, Feb 13, 2013 at 10:18 PM, Lars Aronsson lars@aronsson.se wrote:
I stumbled on the Danish Wiktionary, of all projects. Danish is the 68th biggest language of Wiktionary, and has a little more than 8,000 articles in total. Most of these articles are very short and provide no value to a reader. There is no reason to link to them, and so very unlikely that the next user should stumble upon them unless they are me.
Yet, wikistats tries to make be believe that this tiny project has 400,000 or 500,000 page views each month, and has had so for a long time, http://stats.wikimedia.org/wiktionary/EN/TablesPageViewsMonthly.htm
(I'm not talking about January 2012, which seems to have been an error, and reports 2-3 times that many views.)
My guess is that da.wiktionary has 4,000 page views per month, not 400,000. It's more likely that 400,000 is some background noise, an offset number that should be subtracted from the number of page views for any project.
If you look at the log files for just one day, you should see my IP address (85.228.something) and 3-4 other users who have been editing lately, and not many more people, but perhaps a bunch of interwiki bots.
We need an explanation to these vastly inflated page view statistics.
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 02/14/2013 12:03 AM, rupert THURNER wrote:
this means 569 pages accessed in this hour, at least once.
Thanks for taking the time to do this check! This number already is unreasonable for an obscure project with 8000 articles.
da.d Speciel:Eksporter/engelsk 2 7818
Should Special:Export ever count as page views? Anyway, there are no humans using Special:Export on da.wiktionary in the middle of the night.
this means that e.g. "springer" was supposedly accessed 3 times in that hour. the article does not exist, but there is a red link out of http://da.wiktionary.org/wiki/Wiktionary:Top_10000_(Dansk).
So are there some stupid bots that follow red links? There could be a large number of such accesses on Wiktionary (in any language) because there are so many red links. But bots should never be counted among the page views.
Hi all,
Lars, Rupert thanks for flagging this and you are quite right: the numbers are too high because webstatscollector, the software that does the counts, just counts every request as a hit including bots, error pages etc.
I am planning on running a sprint at the Amsterdam Hackathon to built an easy queryable datastore with clean pageview counts. Please let me know if you are interested in this so I can pitch this.
Best, Diederik
On Wed, Feb 13, 2013 at 3:36 PM, Lars Aronsson lars@aronsson.se wrote:
On 02/14/2013 12:03 AM, rupert THURNER wrote:
this means 569 pages accessed in this hour, at least once.
Thanks for taking the time to do this check! This number already is unreasonable for an obscure project with 8000 articles.
da.d Speciel:Eksporter/engelsk 2 7818
Should Special:Export ever count as page views? Anyway, there are no humans using Special:Export on da.wiktionary in the middle of the night.
this means that e.g. "springer" was supposedly accessed 3 times in
that hour. the article does not exist, but there is a red link out of http://da.wiktionary.org/wiki/**Wiktionary:Top_10000_(Dansk)http://da.wiktionary.org/wiki/Wiktionary:Top_10000_(Dansk) .
So are there some stupid bots that follow red links? There could be a large number of such accesses on Wiktionary (in any language) because there are so many red links. But bots should never be counted among the page views.
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org