I have written a PERL script that parses the SQL dumps (cur & old) and generates a html file, containing for each month since the project started:
1 = number of wikipedians who contributed at least 10 edits 2 = increase in (1) in the past month 3 = wikipedians who contributed > 5 edits in the past month, idem > 100 4 = total number of articles according to new (link) counting system 5 = mean number of revisions per article 6 = mean size of article in bytes 7 = total edits in past month 8 = combined size of all articles 9 = total number of links
The reports also list most active wikipedians, ordered by # edits
Please note that the script produces historical growth figures per Wikipedia based on the >> new (link) counting system << right from the first month.
See results for de: fr: nl: eo: (and Perl script itself) at http://members.chello.nl/epzachte/Wikipedia/Statistics
A. Feedback is appreciated
B. I propose to run this script weekly on the new SQL dumps for all WP's and put the resulting html files in a public folder.
C. I'd like to test the script with the huge English SQL dumps, but I can't download a 1600Mb file without transmission errors. Could someone please split the file in 50 Mb chuncks, generate MD5 checksums, put all in a temp folder (public!) and inform me? Thanks! xxx@chello.nl (xxx=epzachte)
D. Open issues: unicode support, possibly further optimization for English version (e.g. sort)
Erik Zachte
---- ad B The English version will run for a while, but at least it will not tie up the live database. I expect it will take under an hour, the German files (650 Mb) were processed in 4.5 minutes on my 1.2 GHz PC.
Ad C. As a test I downloaded the same 100Mb TomeRaider file eight times, four times the checksum failed, FTP is too unreliable for huge files.