On Thu, Nov 27, 2008 at 10:22 AM, Ilmari Karonen nospam@vyznev.net wrote:
I was going to disagree with you, but it does seem there's something weird going on. Compare these queries:
mysql> SELECT page_namespace, page_title FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0; Empty set (4.63 sec)
mysql> SELECT page_namespace, page_title, page_id FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0; Empty set (4.90 sec)
mysql> SELECT page_namespace, page_title, page_is_redirect FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0; Empty set (30.56 sec)
mysql> SELECT * FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0; Empty set (41.23 sec)
This isn't just random variation either, but seems completely repeatable: including the page_is_redirect field (or, apparently, any field other than page_namespace, page_title or page_id) in the query, whether in the field list or in the WHERE clause, makes it run much more slowly. WTF?
The EXPLAINs indicate what the difference is, although why it should make a difference with no rows returned is a separate issue.
mysql> EXPLAIN SELECT page_namespace FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: page type: ref possible_keys: name_title key: name_title key_len: 4 ref: const rows: 7863987 Extra: Using where; Using index 1 row in set (0.01 sec)
mysql> EXPLAIN SELECT page_is_redirect FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: page type: ref possible_keys: name_title key: name_title key_len: 4 ref: const rows: 7863988 Extra: Using where 1 row in set (0.00 sec)
"Using index" is the critical bit here. The name_title index is on (page_namespace, page_title); it also includes (page_id), the primary key. Therefore a query for any of those three columns can be satisfied by examining the index alone, as long as the name_title index is being used for retrieval. Any other columns require examination of the data.
The question is why this should make any difference when no rows are returned. If you have to do something so close to a table scan, the best execution strategy might be to just read through the data and ignore the index altogether. So if MySQL determines that it can use the index for the whole scan, it might use that; if it thinks it will need to read the table data anyway, it might ignore the index and scan the table itself, on the theory that this is going to be more efficient if a large percentage of rows are returned. The latter strategy would turn out to be ineffective here, because no rows are actually returned and the first strategy would have resulted in scanning less data, but MySQL doesn't know that in advance.
The problem with that theory is that the EXPLAINs indicate identical execution strategies. If I were right, then the second query should be of type ALL, not ref. It could be EXPLAIN is lying, or that the reason for the discrepancy is something different. Running SHOW PROFILE (http://dev.mysql.com/doc/refman/5.0/en/show-profiles.html) on localhost for a similar query on an old simplewiki database, I determine that the significant difference is in the "sending data" phase of the query, at least on my machine with that database (the effect is still visible, they're 0.04s vs. 0.14s or so). That doesn't tell me much, unfortunately.
The fact that the effect occurs on my local database indicates that this particular difference has absolutely nothing to do with views, though.