Finally, new wikistats.
It took a couple of weekends to get the scripts up to date for the new database format.
You may have to refresh the page in your browser to see new stats (Ctrl-F5) Stats are generated from newest dump (March 9)
The layout has been improved in some places (newest stats on top, language names in comparison tables).
New features:
Records counts per namespace: http://en.wikipedia.org/wikistats/EN/TablesWikipediaEN.htm#namespaces
Percentage categorised articles (same url as above)
Hierarchical category trees per Wikipedia (some are huge!): http://en.wikipedia.org/wikistats/EN/CategoryOverviewIndex.htm
Not entirely new but not yet advertised here: http://en.wikipedia.org/wikistats/EN/TimeLinesIndex.htm
EasyTimeline charts are collected per Wikipedia and listed together with the script code. This may serve as a source of inspiration and help to learn the syntax. Also this can help to find real gems on other Wikipedias that deserve to be translated. Although starting a timeline from scratch is not completely trivial, expanding, correcting or certainly translating an existing chart is really where the plug-in earns its name.
Tech notes on script update:
Decipering the serialized compressed info was not a major hurdle, although the Perl equivalent (http://hurring.com/code/perl/serialize/ )was unusable, way too slow (goes through a state machine for each character), so I had to cook something myself.
Keeping everything within reasonable memory boundaries was more difficult, wherever possible data are written to disk in several bins (e.g. one intermediate file per month history), sorted per bin, then merged before readback.
Erik Zachte
Hi,
Le Monday 21 March 2005 06:25, Erik Zachte a écrit :
Finally, new wikistats.
Oh great! Thanks a lot for this. It is very useful and also fun.
Erik Zachte
Regards, Yann
On Mon, 21 Mar 2005 06:25:08 +0100, Erik Zachte e.p.zachte@chello.nl wrote:
Finally, new wikistats.
My hiero! Looks fantaastic.
It took a couple of weekends to get the scripts up to date for the new database format.
I hope you're preparing in advance for 1.5. ;-)
Hierarchical category trees per Wikipedia (some are huge!): http://en.wikipedia.org/wikistats/EN/CategoryOverviewIndex.htm
Yes, huge... we need to find a better way of displaying these. There are some interesting tree-mapping packages out there...
Keeping everything within reasonable memory boundaries was more difficult, wherever possible data are written to disk in several bins (e.g. one intermediate file per month history), sorted per bin, then merged before readback.
Erik Zachte
How long does calculation take now? If there were a machine dedicated to stats of various kinds, with its own mirror of the db, could this be done more efficiently as a running-total, updated whenever the db was updated?
wikitech-l@lists.wikimedia.org