On sab, 2003-02-01 at 07:44, Erik Moeller wrote:
Adding
composite indexes won't help there. What you need is an extra
redundant table that contains the number of wanted links.
We have a table that contains broken links, which is used in the Most
Wanted query. I have no information about the size of these tables in our
current database, but I suspect that, again, indexes are not properly
used, otherwise the bad performance of this query (which has prompted
Magnus to cache the whole page and Brion to create a "miser" mode in which
it cannot be loaded) is not explainable.
EXPLAIN
SELECT bl_to, COUNT( DISTINCT bl_from ) as nlinks
FROM brokenlinks
GROUP BY bl_to
HAVING nlinks > 1
ORDER BY nlinks DESC
LIMIT 0, 50;
+-------------+-------+---------------+-------+---------+------+--------+---------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra
|
+-------------+-------+---------------+-------+---------+------+--------+---------------------------------+
| brokenlinks | index | NULL | bl_to | 255 | NULL | 319493 | Using temporary;
Using filesort |
+-------------+-------+---------------+-------+---------+------+--------+---------------------------------+
The problem is that it has to count up *every* row, then sort based on
the counts just to return the top 50. The grouping may be sped up by the
index, but it's still going to be a bit of work. (Just tried it, took
37.59 seconds.)
Jan has suggested having a summary table that keeps track of the counts,
so we can access them directly. I'm inclined to agree; we could update
it on page saves (and deletes and renames and undeletes!) to save the
bother of the group/count/sort behemoth. Something like:
CREATE TABLE wantedpages (
wanted_title varchar(255) binary NOT NULL default '',
wanted_count int unsigned NOT NULL default 0,
UNIQUE KEY wanted_title,
INDEX wanted_count
);
Index on the count, certainly, to speed the sort for the most wanted
display. Index on the title should speed picking the right ones on
updates, right?
Anyway, this would turn the most wanted query into:
SELECT wanted_title,wanted_count
FROM wantedpages
WHERE wanted_count > 1
ORDER BY wanted_count DESC
LIMIT 0, 50;
which should be quite a bit faster.
I suspect we can be smarter about how we update links and brokenlinks as
well; currently we just delete every row pertaining to the page we're
dealing with and re-insert it.
-- brion vibber (brion @
pobox.com)