Helloes,
There're few new things at http://dammit.lt/wikistats/
1) All projects are included. Non-wikipedia projects will have suffix in raw data. Suffixes are pretty much self explanatory (haha).
wiktionary: .d wikinews: .n wikimedia: .m (meta, commons et al) wikibooks: .b wikisource: .s mediawiki: .w wikiversity: .v wikiquote: .q
2) For lazy people there will be daily packages, which will: - Have a single .tgz archive with per-project files inside (no more splitting!) - Um, daily aggregation, instead of hourly - Pages with low number of reads will not be included (need to have at least 10 daily visits to be included) - Files are generally much much smaller ( enwiki daily compressed filtered data is just 5MB )
For now build process will go back just a week, but over time the archive may become bigger. This will also reduce the hourly data retention (unless archive.org or someone wishes to archive everything)
I'll be also in process of upgrading my box (or maybe moving to new shiny stats server we may get some day :) - cause it takes an hour to actually process the data on my 3-year-old flake :)
3) Second number is now actually bytes, in case anyone is interested :)
I've been getting various feedback lately from non-wiki world, where people use this data for popularity ranking of various bits.
BR,
On Sat, May 17, 2008 at 1:29 PM, Domas Mituzas midom.lists@gmail.com wrote:
Helloes,
There're few new things at http://dammit.lt/wikistats/
- All projects are included. Non-wikipedia projects will have suffix
in raw data. Suffixes are pretty much self explanatory (haha).
wiktionary: .d wikinews: .n wikimedia: .m (meta, commons et al) wikibooks: .b wikisource: .s mediawiki: .w wikiversity: .v wikiquote: .q
- For lazy people there will be daily packages, which will:
- Have a single .tgz archive with per-project files inside (no more
splitting!)
- Um, daily aggregation, instead of hourly
- Pages with low number of reads will not be included (need to have
at least 10 daily visits to be included)
- Files are generally much much smaller ( enwiki daily compressed
filtered data is just 5MB )
For now build process will go back just a week, but over time the archive may become bigger. This will also reduce the hourly data retention (unless archive.org or someone wishes to archive everything)
I'll be also in process of upgrading my box (or maybe moving to new shiny stats server we may get some day :) - cause it takes an hour to actually process the data on my 3-year-old flake :)
- Second number is now actually bytes, in case anyone is interested :)
I've been getting various feedback lately from non-wiki world, where people use this data for popularity ranking of various bits.
BR,
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Nice to see them all! What also would be nice is search statistics. Currently only Special:Search/* can be found, whereas the major part of the searches is via index.php?title=Special:Search=&search=xyz or Special:Search?search=xyz.
Bryan
Domas Mituzas wrote:
Helloes,
There're few new things at http://dammit.lt/wikistats/
(...)
Good :)
- Second number is now actually bytes, in case anyone is interested :)
I don't think so. The second number is always the same as the first one, so something is wrong there. Although not on your script, as the numbers are the same on the pagecounts files too.
Wikistats highlight some rather bizarre IE activity: grep for IE60Fixes.css or shared.css Didn't manage reproduce when it happens, though.
Also, some article names end in &action=edit, &action=print, &action=history, &redlink=1... I guess the filterer has some problems extracting titles from queries to index.php
Hi!
I don't think so. The second number is always the same as the first one, so something is wrong there. Although not on your script, as the numbers are the same on the pagecounts files too.
Oh, the numbers are now different. It was my script being lazy, but new one has them all.
Wikistats highlight some rather bizarre IE activity: grep for IE60Fixes.css or shared.css Didn't manage reproduce when it happens, though.
Its just buggered javascript somewhere, we never reproduced that, but it is constant :)
Also, some article names end in &action=edit, &action=print, &action=history, &redlink=1... I guess the filterer has some problems extracting titles from queries to index.php
no, it doesn't have any problems, question mark terminates the title, and only /wiki/Blah titles are accepted. If anything wrong shows up there, it is because request arrives to our servers wrong already. :)
wikitech-l@lists.wikimedia.org