Hello Anthony,
I'm back at my lair (phew, finally ;-)
Regarding the files at http://dammit.lt/wikistats/ : What are "en.b", "en.d", "en2", etc?
suffixes indicate projects - from http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/filter.c?r... :
projects[] = { {"wikipedia","",NULL}, {"wiktionary",".d",NULL}, {"wikinews",".n",NULL}, {"wikimedia",".m",check_wikimedia}, {"wikibooks",".b",NULL}, {"wikisource",".s",NULL}, {"mediawiki",".w",NULL}, {"wikiversity",".v",NULL}, {"wikiquote",".q",NULL}, NULL },
en2 is, um, http://en2.wikipedia.org/ ;-) it used to exist once upon a time, and apparently there're some referrals.
Are edits included, or only views?
That is views only - though you can find actual logic in above file, it is mostly this pattern:
which is what we have for special pages and views.
Are the hit counts actual, or 1/10th sampled, or something else?
They are actual, with duplicates removed (that is, we don't count in cache-to-cache traffic, only end-user-to-cache).
pagecounts-20090501-200000.gz<http://dammit.lt/wikistats/pagecounts-20090501-200000.gz
is
the hour *beginning* 20:00:00?
ending, I think. let me check, yes, end time. logic is in produceDump() at http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/collector.... :)
I think I may end up documenting this somewhat more, but I need to do some promised and long overdue development on this project.
Domas