[00:18] <TimStarling> mysql> SELECT 'Listusers' as type, 2 AS namespace, user_name AS title, user_name as value, user_id, COUNT(ug_group) as numgroups FROM `user` LEFT JOIN `user_groups` ON user_id=ug_user GROUP BY user_name, user_id ORDER BY value LIMIT 50,50; [00:18] <TimStarling> 50 rows in set (32.94 sec) [00:18] <TimStarling> mysql> SELECT 'Listusers' as type, 2 AS namespace, user_name AS title, user_name as value, user_id, COUNT(ug_group) as numgroups FROM `user` LEFT JOIN `user_groups` ON user_id=ug_user GROUP BY user_name ORDER BY value LIMIT 50,50; [00:18] <TimStarling> 50 rows in set (0.01 sec)
Some general thoughts about this while it's on my mind: the key here to minimising the impact of this kind of problem is isolation, rather than distribution. We already have good isolation for search, and improving isolation for images -- if one of those two services goes offline then the rest should stay up, unaffected. Maybe it's time we introduced a "basic" query group, containing those queries required for pages views. Then we could send all "basic" queries to a dedicated cluster, and all other queries to a second isolated cluster. Then as long as we can keep the apache thread count low enough, any problem with those diverse special page queries would not affect page view performance.
We could go even further and split the apache cluster into an "ordinary page view" cluster and an "everything else" cluster. This would mitigate DoS attacks on apache resources.
Any comments?
-- Tim Starling