Jürgen Herz wrote:
The RTT between me (Germany) to knams is the same as to pmtpa (around 60ms) but it's 350ms to yaseo. That's of course a lot higher, but if they would have the right data available, this would be still a lot faster than the dozent seconds (if not timing out) the knams Squids need to deliver pages.
I must admit it's not always slow but if, it is constantly for a longer time (mostly in the evening hours). But then I can't see noticeable problems on Ganglia. I mean I know that Wikipedia's servers have to serve a lot requests and they increased constantly (judging from the graphs at NOC) but if it's slow there's no higher load, not more requests or so - strange. Maybe it's some internal reason like waiting for NFS server or so, but I'm sure you'll make it work smooth again.
The main reason for slow service squid service times lately seems to have been memory issues. A couple of days ago, one of the knams squids was very slow (often tens of seconds) because it was swapping, and another was heading that way.
System administration issues like this are a very common cause of slowness. There's no magic bullet to solve it -- it's just a matter of progressively improving our techniques, and increasing the size of the sysadmin team. Lack of hardware may be an issue for certain services, but identifying which services are the problem, determining what we need to order, and then working out which part of the chain will give out next, is no easy task. We have 3 squid clusters and 2 apache clusters with their own memcached, DB, NFS and search -- if any one of those services has a problem, it will lead to a slow user experience. To add to the headache, many reports of slowness are due to problems with the client network rather than with our servers.
Luckily most of our monitoring statistics are public, so the entry barrier to this kind of performance analysis is low. I'm glad you're taking an interest. If you want to offer advice on a real-time basis, the #wikimedia-tech channel on irc.freenode.net is the best place to do it.
-- Tim Starling