Hi again.
For anyone who didn´t look my previous post I'm Telco
Engineer, Assistan Professor at the UAX University in
Madrid (Spain) and I'm working for my Ph.D (at URJC
University) in topics related with Internet multimedia
content distribution (currently focusing on
Wikipedia).
You can visit:
http://libresoft.urjc.es/ for more info about our
work, and
http://gsyc.escet.urjc.es/ for more info about our
technical group.
After processing all the info I've could found at
meta.wikimedia.org, I still have several questions on
the air. I don´t know if someone could help me:
1.- Which is the main bottleneck of Wikimedia whole
system nowadays? Apaches? My-SQL database? Both??
2.- I've found no precise info in My-SQL Wikimedia
tables about the size of the modifications that a
certain user made to an article. It could be very
interesting to build some nice statistics about the
shape of register users contributions (obviously,
respecting the anonimity of users).
Our group has build several studies about these
factors with many kind of libre software projects.
3.- Finally, in addition to SQUID configs (I still ask
for someone who could help me on this), I wonder if it
could be possible to get some of the Apache logs to
process them and get info about:
a) The top rated pages (by number of hits).
b) The average served page size.
This way, I could begin to think a method to
discriminate the top requested pages, to start with a
distribution content simulation framework at our grid
computing develop architecture.
Regards,
Felipe Ortega.
______________________________________________
Renovamos el Correo Yahoo!
Nuevos servicios, más seguridad
http://correo.yahoo.es