On sab, 2003-02-01 at 07:12, Jan Hidders wrote:
Sorry for replying to myself, but I just saw in CVS
that the SQL is
something like this:
SELECT HIGH_PRIORITY cur_id
FROM cur
WHERE cur_namespace=" . "{$ns} AND cur_title='" . wfStrencode( $t )
. "'";
but if I look in buildTables.inc we have a separate index for namespace and
title. That should really be a combined index and that holds for everywhere
we do a look-up with namespace + title. Do we have those?
Combined index might make sense. In most cases it won't significantly
cut down the number of rows, but for some (main page, current events,
user:maveric149) it could take out a couple hundred rows by splitting
the page and talk page.
Also, a vaguely related question on the watchlist query. Currently the
return-whole-watchlist query looks like:
SELECT DISTINCT
cur_id,cur_namespace,cur_title,cur_comment,
cur_user,cur_user_text,cur_timestamp,cur_minor_edit,cur_is_new
FROM cur,watchlist
WHERE wl_user={$uid} AND wl_title=cur_title
AND (cur_namespace=wl_namespace OR cur_namespace=wl_namespace+1)
ORDER BY cur_timestamp DESC {$dolimit}
I did an 'explain' using my uid. (I have 1370 titles in my watchlist;
the larger # of estimated returned rows is presumably due to the weird
InnoDB optimizer, which is known to return sometimes-wrong numbers of
rows.)
+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
| table | type | possible_keys | key | key_len | ref |
rows | Extra |
+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
| watchlist | ref | wl_user | wl_user | 4 | const |
3264 | where used; Using index; Using temporary; Using filesort |
| cur | ref | cur_namespace,cur_title | cur_title | 255 | watchlist.wl_title |
1 | where used |
+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
Is there a better way of doing this? We have a combined namespace/title
index on watchlist; but we want to pick up two variants in the
namespace. In most cases, just using the title as index on cur should
then be sufficient (it's relatively rare for titles to exist in multiple
groups of namespaces, mostly just in one namespace and the X_talk: next
to it).
And I'm not sure if we really need the resulting
cur_id. If we don't, then
indeed only the index is needed and that would really speed up the query.
99% of the time, we don't; we're just checking for existence. It should
be sufficient to just have a boolean 'exists/doesn't exist' field. When
we're checking existence, is it most efficient to SELECT the indexed
field, or COUNT(*), or what?
-- brion vibber (brion @
pobox.com)