Hopefully I have the right list now...
I was a developer a while back for the (purportedly) 17th busiest website in the world and though the sysadmins were more directly involved in improving response speed, I ended up doing a lot of stuff myself. The way that Wikipedia is slow is very similar to problems we had. The delays are almost all in the initial request for new pages. Once the connection is made, content usually comes across rapidly. This usually points to some sort of full queue in software, or a full queue due to excessive connections on a single machine causing a hardware wait state. New url requests are made to stand in line, sometimes because settings for maximum simultaneous connections are too low, or the settings are high enough but all RAM is consumed servicing current requests, etc,. This may seem obvious, but it lets us de-emphasize other potential problems such as bloated overworked DB, bogged disk fetches, etc. So, based on all this, I would say the greatest single improvement would be to set up some sort of simple DNS round robin (true load balancing could come later). I'm not sure what your current server setup is, but if you could have at least two Apache servers running on two machines with one of them running the Round Robin algorithm I think the majority of your response problems would disappear. Don't listen to those who say Round Robin is a naive approach. It's true that allocation of new connections is done in a "dumb" way (in a two server setup it will just throw every other connection to the secondary webserver)-- but that's all you really need, I think. Suddenly each machine is servicing half the client connections and everything is fast... Of course, maybe the reasons for your slowness are more complex, but based on what I can see from the client side my suspicion is that a simple Round Robin would clear it all up and that simply adding new Apache processes on new servers as you grow would make you at least 10 times faster during peak times than at present. -- JDG --
I generally agree with this analysis, and I also think that when we do move to a round-robin or similar load balancing situation, we will have much more capacity than would be linearly predicted by the addition of one extra machine. Like JDG, this is based on my own experience, although not with the 17th busiest website in the world. :-)
JDG wrote:
Hopefully I have the right list now...
I was a developer a while back for the (purportedly) 17th busiest website in the world and though the sysadmins were more directly involved in improving response speed, I ended up doing a lot of stuff myself. The way that Wikipedia is slow is very similar to problems we had. The delays are almost all in the initial request for new pages. Once the connection is made, content usually comes across rapidly. This usually points to some sort of full queue in software, or a full queue due to excessive connections on a single machine causing a hardware wait state. New url requests are made to stand in line, sometimes because settings for maximum simultaneous connections are too low, or the settings are high enough but all RAM is consumed servicing current requests, etc,. This may seem obvious, but it lets us de-emphasize other potential problems such as bloated overworked DB, bogged disk fetches, etc. So, based on all this, I would say the greatest single improvement would be to set up some sort of simple DNS round robin (true load balancing could come later). I'm not sure what your current server setup is, but if you could have at least two Apache servers running on two machines with one of them running the Round Robin algorithm I think the majority of your response problems would disappear. Don't listen to those who say Round Robin is a naive approach. It's true that allocation of new connections is done in a "dumb" way (in a two server setup it will just throw every other connection to the secondary webserver)-- but that's all you really need, I think. Suddenly each machine is servicing half the client connections and everything is fast... Of course, maybe the reasons for your slowness are more complex, but based on what I can see from the client side my suspicion is that a simple Round Robin would clear it all up and that simply adding new Apache processes on new servers as you grow would make you at least 10 times faster during peak times than at present. -- JDG --
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org