[Wikitech-l] Script available for uniform statistics for all Wikipedias

19 Jul 2003


      I have written a PERL script that parses the SQL dumps (cur & old) and
generates a html file, containing for each month since the project
started:
1 = number of wikipedians who contributed at least 10 edits
2 = increase in (1) in the past month 
3 = wikipedians who contributed > 5 edits in the past month, idem > 100 
4 = total number of articles according to new (link) counting system
5 = mean number of revisions per article
6 = mean size of article in bytes
7 = total edits in past month
8 = combined size of all articles
9 = total number of links
The reports also list most active wikipedians, ordered by # edits
Please note that the script produces historical growth figures per
Wikipedia based on the >> new (link) counting system << right from the
first month.
See results for de: fr: nl: eo: (and Perl script itself) at
http://members.chello.nl/epzachte/Wikipedia/Statistics
A. 
Feedback is appreciated
B. 
I propose to run this script weekly on the new SQL dumps for all WP's
and put the resulting html files in a public folder.
C. 
I'd like to test the script with the huge English SQL dumps, but I can't
download a 1600Mb file without transmission errors. Could someone please
split the file in 50 Mb chuncks, generate MD5 checksums, put all in a
temp folder (public!) and inform me? Thanks! xxx@chello.nl
(xxx=epzachte)
D. Open issues: unicode support, possibly further optimization for
English version (e.g. sort)
Erik Zachte
----
ad B
The English version will run for a while, but at least it will not tie
up the live database. I expect it will take under an hour, the German
files (650 Mb) were processed in 4.5 minutes on my 1.2 GHz PC.
Ad C.
As a test I downloaded the same 100Mb TomeRaider file eight times, four
times the checksum failed, FTP is too unreliable for huge files.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Script available for uniform statistics for all Wikipedias