On sab, 2003-02-01 at 07:12, Jan Hidders wrote:
Sorry for replying to myself, but I just saw in CVS that the SQL is something like this:
SELECT HIGH_PRIORITY cur_id FROM cur WHERE cur_namespace=" . "{$ns} AND cur_title='" . wfStrencode( $t ) . "'";
but if I look in buildTables.inc we have a separate index for namespace and title. That should really be a combined index and that holds for everywhere we do a look-up with namespace + title. Do we have those?
Combined index might make sense. In most cases it won't significantly cut down the number of rows, but for some (main page, current events, user:maveric149) it could take out a couple hundred rows by splitting the page and talk page.
Also, a vaguely related question on the watchlist query. Currently the return-whole-watchlist query looks like: SELECT DISTINCT cur_id,cur_namespace,cur_title,cur_comment, cur_user,cur_user_text,cur_timestamp,cur_minor_edit,cur_is_new FROM cur,watchlist WHERE wl_user={$uid} AND wl_title=cur_title AND (cur_namespace=wl_namespace OR cur_namespace=wl_namespace+1) ORDER BY cur_timestamp DESC {$dolimit}
I did an 'explain' using my uid. (I have 1370 titles in my watchlist; the larger # of estimated returned rows is presumably due to the weird InnoDB optimizer, which is known to return sometimes-wrong numbers of rows.)
+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+ | watchlist | ref | wl_user | wl_user | 4 | const | 3264 | where used; Using index; Using temporary; Using filesort | | cur | ref | cur_namespace,cur_title | cur_title | 255 | watchlist.wl_title | 1 | where used | +-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
Is there a better way of doing this? We have a combined namespace/title index on watchlist; but we want to pick up two variants in the namespace. In most cases, just using the title as index on cur should then be sufficient (it's relatively rare for titles to exist in multiple groups of namespaces, mostly just in one namespace and the X_talk: next to it).
And I'm not sure if we really need the resulting cur_id. If we don't, then indeed only the index is needed and that would really speed up the query.
99% of the time, we don't; we're just checking for existence. It should be sufficient to just have a boolean 'exists/doesn't exist' field. When we're checking existence, is it most efficient to SELECT the indexed field, or COUNT(*), or what?
-- brion vibber (brion @ pobox.com)