Zwinger was out of commission for about a half hour from 22:20 UTC. Not entirely sure what happened, but syslog entries record a number of out of memory kills of the web server.
Symptoms included very high load, very slow response on NFS, very *very* slow response on interactive terminals, and timeouts of attempted ssh logins. Machine seems ok after a reboot.
Ain't single points of failure grand? :)
Gwicke's been experimenting with the coda distributed filesystem. Hopefully it wasn't involved in the crash. :) If it works nicely, it could be more reliable than our current center-heavy NFS system for sharing the files to the web servers.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org