Hello folks. I posted the following on the 'Why Wikipedia Runs Slow' meta page, but thought it might get more exposure here on the list. My apologies if you've already discussed this in depth....
I was a developer a while back for the (purportedly) 17th busiest website in the world and though the sysadmins were more directly involved in improving response speed, I ended up doing a lot of stuff myself. The way that Wikipedia is slow is very similar to problems we had. The delays are almost all in the initial request for new pages. Once the connection is made, content usually comes across rapidly. This usually points to some sort of full queue in software, or a full queue due to excessive connections on a single machine causing a hardware wait state. New url requests are made to stand in line, sometimes because settings for maximum simultaneous connections are too low, or the settings are high enough but all RAM is consumed servicing current requests, etc,. This may seem obvious, but it lets us de-emphasize other potential problems such as bloated overworked DB, bogged disk fetches, etc. So, based on all this, I would say the greatest single improvement would be to set up some sort of simple DNS round robin (true load balancing could come later). I'm not sure what your current server setup is, but if you could have at least two Apache servers running on two machines with one of them running the Round Robin algorithm I think the majority of your response problems would disappear. Don't listen to those who say Round Robin is a naive approach. It's true that allocation of new connections is done in a "dumb" way (in a two server setup it will just throw every other connection to the secondary webserver)-- but that's all you really need, I think. Suddenly each machine is servicing half the client connections and everything is fast... Of course, maybe the reasons for your slowness are more complex, but based on what I can see from the client side my suspicion is that a simple Round Robin would clear it all up and that simply adding new Apache processes on new servers as you grow would make you at least 10 times faster during peak times than at present. -- JDG --
"Jim Guide" skribis:
Hello folks. I posted the following on the 'Why Wikipedia Runs Slow' meta page, but thought it might get more exposure here on the list. My apologies if you've already discussed this in depth....
Maybe you should look in the archives of the wikitech-l-Mailinglist, not this one.
http://mail.wikipedia.org/pipermail/wikitech-l/ http://news.gmane.org/gmane.science.linguistics.wikipedia.technical news:news.gmane.org/gmane.science.linguistics.wikipedia.technical
... and post your suggestions there.
Paul
wikipedia-l@lists.wikimedia.org