On Tue, Nov 17, 2009 at 7:25 AM, Carcharoth carcharothwp@googlemail.com wrote:
And to be honest, if I had Googled myself some understanding of this, I may have ended up even more confused about it. If I had asked questions like this on the wikitech-l mailing list, would I have been told to Google the answer?
No, but you didn't. The message was originally posted on wikitech-l, and wasn't meant for wikien-l. It was a brief technical summary of why the site crashed, directed toward people who know about the site architecture and would get useful information from the description. Explaining what all the terms mean wouldn't really serve the purpose of the original message, which was to inform other people knowledgeable about and/or responsible for the site's operation so that the problem could be kept in mind in case it happened again, etc. I don't know what the point was of forwarding it to wikien-l, since it doesn't contain any information that's useful to users.
The technical details are, in something closer to laymen's terms: Andrew updated the software (scapped), and some servers crashed. Looking at various monitoring tools (Nagios/Ganglia), he figured out that some of the older, less powerful (4-CPU) application servers (Apaches) didn't have enough resources to handle the update properly (went into swap). Unfortunately, this somehow (I'm not clear on this part) drove an important caching server (memcached node) to crash also. The reduction in caching caused the database to overload, as requests that normally would have been cached had to go to the database. This made the site mostly inaccessible for about ten minutes, until the caches were repopulated enough to reduce database load to normal levels.
As you can see, this doesn't really contain any info useful to anyone but server admins. Which is why it was originally posted to wikitech-l, not wikien-l.