Hi again.
For anyone who didn´t look my previous post I'm Telco Engineer, Assistan Professor at the UAX University in Madrid (Spain) and I'm working for my Ph.D (at URJC University) in topics related with Internet multimedia content distribution (currently focusing on Wikipedia).
You can visit:
http://libresoft.urjc.es/ for more info about our work, and
http://gsyc.escet.urjc.es/ for more info about our technical group.
After processing all the info I've could found at meta.wikimedia.org, I still have several questions on the air. I don´t know if someone could help me:
1.- Which is the main bottleneck of Wikimedia whole system nowadays? Apaches? My-SQL database? Both??
2.- I've found no precise info in My-SQL Wikimedia tables about the size of the modifications that a certain user made to an article. It could be very interesting to build some nice statistics about the shape of register users contributions (obviously, respecting the anonimity of users).
Our group has build several studies about these factors with many kind of libre software projects.
3.- Finally, in addition to SQUID configs (I still ask for someone who could help me on this), I wonder if it could be possible to get some of the Apache logs to process them and get info about:
a) The top rated pages (by number of hits). b) The average served page size.
This way, I could begin to think a method to discriminate the top requested pages, to start with a distribution content simulation framework at our grid computing develop architecture.
Regards,
Felipe Ortega.
______________________________________________ Renovamos el Correo Yahoo! Nuevos servicios, más seguridad http://correo.yahoo.es
wikitech-l@lists.wikimedia.org