[Wikipedia-l] New Wikistats

Erik Zachte e.p.zachte at chello.nl
Mon Mar 21 05:25:08 UTC 2005


Finally, new wikistats.

It took a couple of weekends to get the scripts up to date for the new
database format.

You may have to refresh the page in your browser to see new stats (Ctrl-F5)
Stats are generated from newest dump (March 9)

The layout has been improved in some places (newest stats on top, language
names in comparison tables).

New features:

Records counts per namespace:
http://en.wikipedia.org/wikistats/EN/TablesWikipediaEN.htm#namespaces

Percentage categorised articles (same url as above)

Hierarchical category trees per Wikipedia (some are huge!):
http://en.wikipedia.org/wikistats/EN/CategoryOverviewIndex.htm

Not entirely new but not yet advertised here:
http://en.wikipedia.org/wikistats/EN/TimeLinesIndex.htm

EasyTimeline charts are collected per Wikipedia and listed together with the
script code. This may serve as a source of inspiration and help to learn the
syntax. Also this can help to find real gems on other Wikipedias that
deserve to be translated. Although starting a timeline from scratch is not
completely trivial, expanding, correcting or certainly translating an
existing chart is really where the plug-in earns its name.

Tech notes on script update:

Decipering the serialized compressed info was not a major hurdle, although
the Perl equivalent (http://hurring.com/code/perl/serialize/ )was unusable,
way too slow (goes through a state machine for each character), so I had to
cook something myself.

Keeping everything within reasonable memory boundaries was more difficult,
wherever possible data are written to disk in several bins (e.g. one
intermediate file per month history), sorted per bin, then merged before
readback.

Erik Zachte





More information about the Wikipedia-l mailing list