Script available for uniform statistics for all Wikipedias - Wikitech-l

18 Jul 2003


      I have written a PERL script that parses the SQL dumps (cur & old) and
generates a html file, containing for each month since the project
started:
1 = number of wikipedians who contributed at least 10 edits
2 = increase in (1) in the past month 
3 = wikipedians who contributed > 5 edits in the past month, idem > 100 
4 = total number of articles according to new (link) counting system
5 = mean number of revisions per article
6 = mean size of article in bytes
7 = total edits in past month
8 = combined size of all articles
9 = total number of links
The reports also list most active wikipedians, ordered by # edits
Please note that the script produces historical growth figures per
Wikipedia based on the >> new (link) counting system << right from the
first month.
See results for de: fr: nl: eo: (and Perl script itself) at
http://members.chello.nl/epzachte/Wikipedia/Statistics
A. 
Feedback is appreciated
B. 
I propose to run this script weekly on the new SQL dumps for all WP's
and put the resulting html files in a public folder.
C. 
I'd like to test the script with the huge English SQL dumps, but I can't
download a 1600Mb file without transmission errors. Could someone please
split the file in 50 Mb chuncks, generate MD5 checksums, put all in a
temp folder (public!) and inform me? Thanks! xxx@chello.nl
(xxx=epzachte)
D. Open issues: unicode support, possibly further optimization for
English version (e.g. sort)
Erik Zachte
----
ad B
The English version will run for a while, but at least it will not tie
up the live database. I expect it will take under an hour, the German
files (650 Mb) were processed in 4.5 minutes on my 1.2 GHz PC.
Ad C.
As a test I downloaded the same 100Mb TomeRaider file eight times, four
times the checksum failed, FTP is too unreliable for huge files.