On Tue, Feb 11, 2003 at 12:35:37AM -0800, Brion Vibber wrote:
On sab, 2003-02-08 at 14:26, Brion Vibber wrote:
The watchlist:
SELECT DISTINCT cur_id,cur_namespace,cur_title,cur_comment, cur_user,cur_user_text,cur_timestamp,cur_minor_edit,cur_is_new FROM cur,watchlist WHERE wl_user=7457 AND wl_title=cur_title AND (cur_namespace=wl_namespace OR cur_namespace=wl_namespace+1) ORDER BY inverse_timestamp LIMIT 100;
+-----------+------+----------------------------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------+------+----------------------------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+ | watchlist | ref | wl_user | wl_user | 4 | const | 1730 | where used; Using index; Using temporary; Using filesort | | cur | ref | cur_namespace,cur_title,name_title_timestamp | cur_title | 255 | watchlist.wl_title | 1 | where used | +-----------+------+----------------------------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
Explain doesn't give any different results if the namespace+1 is taken out; so adding near-dupe rows to cover talk pages wouldn't help.
Strange. Did you force it to use the index Name_title_timestamp with USE? MySQL sometimes chooses a smaller index than it should.
I'm not quite sure how to match up the indexes right. Note that watchlist has a composite index on wl_user, wl_namespace, and wl_title... but I don't know if it can match those up if part 1 of the index is a constant, and parts 2 and 3 match 1 and 2 of the other table's index.
They don't have to match. All that is required is that the columns on which it joins watchlist with cur are a prefix of some index on cur. So without the namespace+1 part the name_title_timestamp index should do, and with probably also (but I'm not sure because I don't know how it handles the OR). As always, EXPLAIN will tell you if it uses the right indices or not.
Some prolific users have three thousand or more titles in their watchlist; the total number of distinct watched pages on the en.wiki is 30409.
Is there any way we could do this without temporary tables & filesorts & whatnot?
Yes, there is, you could add an extra inverse_timestamp column to the watchlist that (very redundantly) stores the inverse_timestamp of the watched page. Obviously this would have to be updated each time the page is updated (and when the page is added to the watchlist). If you then have an index on (wl_user, inv_timestamp) and make sure MySQL uses it then it won't have to sort.
-- Jan Hidders