On Fri, Oct 10, 2003 at 12:23:18AM +1000, Tim Starling wrote:
I wrote a little perl script to monitor the health of
Wikipedia by
regularly checking the file size of the main log. The idea was that it
would email me if apache crashed. I soon discovered something rather
interesting and unexpected.
Once every 15 minutes or so, Wikipedia shuts down for a minute or two.
This is bad, needless to say. During these episodes, it shows few if any
log entries. Load is depressed slightly (from ~20 to ~10), so CPU
starvation didn't seem likely. I suspected database troubles.
Attached is a zip file containing the output from SHOW PROCESSLIST.
There is one file showing "typical" output, between the episodes, and
three showing output in the middle of an episode. One thing stands out
like a sore thumb:
SELECT 1 FROM user_newtalk WHERE user_ip='167.206.112.85'
SELECT 1 FROM user_newtalk WHERE user_ip='217.244.15.107'
SELECT 1 FROM user_newtalk WHERE user_ip='80.55.166.58'
SELECT 1 FROM user_newtalk WHERE user_ip='66.196.90.11'
SELECT 1 FROM user_newtalk WHERE user_ip='172.176.254.188'
SELECT 1 FROM user_newtalk WHERE user_ip='220.54.212.24'
SELECT 1 FROM user_newtalk WHERE user_id=7580
SELECT 1 FROM user_newtalk WHERE user_ip='194.78.48.226'
SELECT 1 FROM user_newtalk WHERE user_ip='80.225.14.29'
SELECT 1 FROM user_newtalk WHERE user_ip='144.32.128.73'
SELECT 1 FROM user_newtalk WHERE user_ip='62.216.15.127'
SELECT 1 FROM user_newtalk WHERE user_ip='134.151.225.179'
SELECT 1 FROM user_newtalk WHERE user_ip='213.81.145.123'
SELECT 1 FROM user_newtalk WHERE user_ip='129.137.208.159'
SELECT 1 FROM user_newtalk WHERE user_ip='195.93.72.17'
SELECT 1 FROM user_newtalk WHERE user_ip='218.19.141.2'
SELECT 1 FROM user_newtalk WHERE user_ip='68.11.187.242'
SELECT 1 FROM user_newtalk WHERE user_ip='63.34.208.93'
SELECT 1 FROM user_newtalk WHERE user_ip='81.196.21.16'
SELECT 1 FROM user_newtalk WHERE user_ip='202.156.2.82'
Millions of user_newtalk requests, conspicuously absent from the typical
dump. Many of them are doing "statistics", whatever that means, and the
rest are locked. One side of me is curious as to what "statistics" means
and why it produces this behaviour. The other side of me says
Could these hanging selects be just a part of the phenomenon?
I mean, assume, the mysql has some reason not to respond, but
still accepts new requests. Then all anonymous users reading
a page will add a new process looking for a newtalk flag. They
don't cause the slowdown, they are its victims.
So the question is: Why does it slow down every 15 minutes?
Do we have some cron jobs running every quarter of an hour?
Regards,
JeLuf