William Allen Simpson wrote:
Time to time, I'm sure we've all noticed that wikipedia slows to a crawl. Such as last night (local time), for about 15-20 minutes, reading was poor, writing was nearly impossible, see:
April 20, 2006, 10:36 pm http://www.thewritingpot.com/wikistatus/
What is "local time"? Please state your times in UTC. The page you link to doesn't go back as far as April 20, and it doesn't appear to have any archive links.
In any case, there's not much point in complaining about slow response times a day after the fact. As I told you before, the best place to contribute to this sort of thing is on #wikimedia-tech.
http://mail.wikimedia.org/pipermail/wikitech-l/2006-April/034991.html
I tried looking at the site from various views. What struck me was that no matter where I looked from here in the US, east or west or central, all traffic seems to go to Florida, even when the servers are not responding.
No failover to other clusters?
There are no other clusters which fill the same role as pmtpa. Go to this page:
http://meta.wikimedia.org/wiki/Profiling/20051208
and tell me how fast the site would be if every one of those Database::query or memcached::get calls required a couple of transatlantic RTTs. Using centralised caches improves the hit rate, and keeping them within a few kilometres of the apache servers makes the latency acceptable.
Also, the DNS stopped serving inverse addresses. Compare:
[...]
That 84.40.24.22 inverse is only at 2 DNServers both located on the same subnet (very bad practice):
Maybe you should complain to whoever owns those servers.
[...]
However, that loss of DNS responses from the same subnet leads to the conclusion the subnet might be under congestive collapse. That is, this lag might not be produced by wikimedia itself, but a problem with the link to or within the facility.
I very much doubt it. Did you try testing for packet loss by pinging a Wikimedia server?
Is there any other data that might correspond?
Does anybody have clues or notes on what actually might have been happening at the time? RTG/MRTG?
Our MRTG stuff is still down following the loss of larousse, but you can still use these:
http://ganglia.wikimedia.org/ http://tools.wikimedia.de/~leon/stats/reqstats/ https://wikitech.leuksman.com/view/Server_admin_log
-- Tim Starling