On 21/04/13 05:29, David Gerard wrote:
So where would I start looking to work out what's
going on?
If there is any kind of site issue at WMF, I usually start with
Ganglia. It does take some practise to be able to read it correctly,
but it gives you information far more quickly than just about anything
else. My notes on WMF incident response give some hints about how to
use it, as well as discussing some other tools:
https://wikitech.wikimedia.org/wiki/Incident_response
If the problem seems to be downstream of MediaWiki, then profiling is
usually the next thing to look at. Wikipedia has been using DIY
profiling to diagnose site performance issues since it was on a single
server.
* Sometimes it isn't, e.g. this afternoon when the
site was running
like a slug and load average was 0.8 with nothing amiss in top.
Processes in the "S" state do not contribute to the load average,
whether or not users are waiting for them. For example, PHP may be
waiting for Lucene. Try the section in the incident response notes
under "slow backend service".
-- Tim Starling