As my first toolserver project I've made some statistics of the activity in our CVS repository. I made aggregated statistics of the whole repository[1] as well as statistics for individual modules[2]. You'll see per commiter statistics there, LOC statistics, file & directory statistics and changelogs. Addidionally I've put the raw CVS changelogs up there too[4], and finally the whole thing can be downloaded[3] for local viewing & hacking, just make sure you're *ahem* adhering to the license[5].
1. http://tools.wikimedia.de/~avar/cvs/html/all/ 2. http://tools.wikimedia.de/~avar/cvs/html/ 3. http://tools.wikimedia.de/~avar/cvs/cvs.tar.gz 4. http://tools.wikimedia.de/~avar/cvs/log/ 5. http://tools.wikimedia.de/~avar/cvs/COPYING
Hi,
Ævar Arnfjörð Bjarmason wrote:
As my first toolserver project I've made some statistics of the activity in our CVS repository. I made aggregated statistics of the whole repository[1] as well as statistics for individual modules[2]. You'll see per commiter statistics there, LOC statistics, file & directory statistics and changelogs. Addidionally I've put the raw CVS changelogs up there too[4], and finally the whole thing can be downloaded[3] for local viewing & hacking, just make sure you're *ahem* adhering to the license[5].
Thanks for this cvs analysis. The distribution of activity per author is typical as well as other distributions. If you look around for general tools and projects in quantitative measurement of open source software you'll stumble upon the Libresoft group at the Universidad Rey Juan Carlos, Madrid:
For instance have a look at:
http://libresoft.dat.escet.urjc.es/cvsanal/kde3-cvs/index.php?menu=Statistic...
Two weeks ago I've been contacted by Felipe Ortega and Jesus Gonzalez Barahona (this mail is also forwarded to them) of this group. They would like to do other analysis of Wikipedia. I'd like to hear your opinions about this. Here two quotes of them describing what they want to do:
So, our proposal is that, if Wikipedia admins allow us to participate, we are very interested in the design and implementation of a log analysis system for Wikipedia (both Squids and Apaches). As the ammount of information that the system could generate over a significative period of time (about 2 TBytes) is too close to our hardware limits, we may propose to pick up randomly a representative set of samples from Apaches and Squids logs (about 10.000 per hour).
...
Felipe is ready to help you to instrument the Wikipedia squids and apaches, in a way which (we hope) won't have impact on Wikipedia reliability nor performance. However, maybe that instrumenting is not necessary, depending on the log information available. Of course, the work of anonymizing logs before analysis, etc. would be done by us. And also of course, we would contribute back the results of the work, hopefully with a system to measure the performance of the system, which can help to identify bottlenecks and problems.
Maybe they can also collaborate at the toolserver (or at their own machines - I don't know what's better). By the way at the 22C3 three days ago somebody who seems to collect mainframes offered us (wikimedia) a private computer center in Germany with 40-processor machines and a
100TB tape library - any interest? ;-) I'll post the details later.
Greetings, Jakob
toolserver-l@lists.wikimedia.org