On sab, 2003-02-01 at 07:44, Erik Moeller wrote:
Adding composite indexes won't help there. What you need is an extra redundant table that contains the number of wanted links.
We have a table that contains broken links, which is used in the Most Wanted query. I have no information about the size of these tables in our current database, but I suspect that, again, indexes are not properly used, otherwise the bad performance of this query (which has prompted Magnus to cache the whole page and Brion to create a "miser" mode in which it cannot be loaded) is not explainable.
EXPLAIN SELECT bl_to, COUNT( DISTINCT bl_from ) as nlinks FROM brokenlinks GROUP BY bl_to HAVING nlinks > 1 ORDER BY nlinks DESC LIMIT 0, 50; +-------------+-------+---------------+-------+---------+------+--------+---------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------------+-------+---------------+-------+---------+------+--------+---------------------------------+ | brokenlinks | index | NULL | bl_to | 255 | NULL | 319493 | Using temporary; Using filesort | +-------------+-------+---------------+-------+---------+------+--------+---------------------------------+
The problem is that it has to count up *every* row, then sort based on the counts just to return the top 50. The grouping may be sped up by the index, but it's still going to be a bit of work. (Just tried it, took 37.59 seconds.)
Jan has suggested having a summary table that keeps track of the counts, so we can access them directly. I'm inclined to agree; we could update it on page saves (and deletes and renames and undeletes!) to save the bother of the group/count/sort behemoth. Something like:
CREATE TABLE wantedpages ( wanted_title varchar(255) binary NOT NULL default '', wanted_count int unsigned NOT NULL default 0, UNIQUE KEY wanted_title, INDEX wanted_count );
Index on the count, certainly, to speed the sort for the most wanted display. Index on the title should speed picking the right ones on updates, right?
Anyway, this would turn the most wanted query into:
SELECT wanted_title,wanted_count FROM wantedpages WHERE wanted_count > 1 ORDER BY wanted_count DESC LIMIT 0, 50;
which should be quite a bit faster.
I suspect we can be smarter about how we update links and brokenlinks as well; currently we just delete every row pertaining to the page we're dealing with and re-insert it.
-- brion vibber (brion @ pobox.com)