New statistics stuff - Wikitech-l

17 May 2008


      Helloes,
There're few new things at http://dammit.lt/wikistats/
1) All projects are included. Non-wikipedia projects will have suffix  
in raw data. Suffixes are pretty much self explanatory (haha).
wiktionary: .d
wikinews: .n
wikimedia: .m (meta, commons et al)
wikibooks: .b
wikisource: .s
mediawiki: .w
wikiversity: .v
wikiquote: .q
2) For lazy people there will be daily packages, which will:
- Have a single .tgz archive with per-project files inside (no more  
splitting!)
- Um, daily aggregation, instead of hourly
- Pages with low number of reads will not be included (need to have  
at least 10 daily visits to be included)
- Files are generally much much smaller ( enwiki daily compressed  
filtered data is just 5MB )
For now build process will go back just a week, but over time the  
archive may become bigger.
This will also reduce the hourly data retention (unless archive.org  
or someone wishes to archive everything)
I'll be also in process of upgrading my box (or maybe moving to new  
shiny stats server we may get some day :) - cause it takes an hour to  
actually process the data on my 3-year-old flake :)
3) Second number is now actually bytes, in case anyone is interested :)
I've been getting various feedback lately from non-wiki world, where  
people use this data for popularity ranking of various bits.
BR,
-- 
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]