Hello Anthony,
I'm back at my lair (phew, finally ;-)
Regarding the files at
http://dammit.lt/wikistats/ :
What are "en.b", "en.d", "en2", etc?
suffixes indicate projects - from
http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/filter.c?…
:
projects[] = {
{"wikipedia","",NULL},
{"wiktionary",".d",NULL},
{"wikinews",".n",NULL},
{"wikimedia",".m",check_wikimedia},
{"wikibooks",".b",NULL},
{"wikisource",".s",NULL},
{"mediawiki",".w",NULL},
{"wikiversity",".v",NULL},
{"wikiquote",".q",NULL},
NULL
},
en2 is, um,
http://en2.wikipedia.org/ ;-) it used to exist once upon a
time, and apparently there're some referrals.
Are edits included, or only views?
That is views only - though you can find actual logic in above file,
it is mostly this pattern:
http://*.*.org/wiki/*
which is what we have for special pages and views.
Are the hit counts actual, or 1/10th sampled, or
something else?
They are actual, with duplicates removed (that is, we don't count in
cache-to-cache traffic, only end-user-to-cache).
pagecounts-20090501-200000.gz<http://dammit.lt/wikistats/pagecounts-20090501-200000.gz
is
the hour *beginning* 20:00:00?
ending, I think. let me check, yes, end time. logic is in
produceDump() at
http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/collector…
:)
I think I may end up documenting this somewhat more, but I need to do
some promised and long overdue development on this project.
Domas