On 14/03/11 11:48, William Allen Simpson wrote:
Secure basically fell over for awhile, generated nothing but proxy errors. I'm not sure that's what really happened. It may have been a complete inability to actually send or receive data, resulting in a timeout of some sort.
Take a look at the Ganglia graphs. Free memory gone. Big spike in processes. Big drop in network activity!
It was because of the CPU overload on the entire apache cluster which occurred at that time. Secure and every other frontend proxy would have reported errors. Domas and I traced it back to job queue cache invalidations from an edit to [[Template:Reflist]] on the English Wikipedia.
Note that the free memory isn't gone. RRDtool has the very unscientific practice of starting the vertical scale at something other than zero. It rose because processes use memory, and as you noted, the number of processes increased. This is because they were queueing, waiting for the overloaded backend cluster to serve them.
-- Tim Starling