[Toolserver-l] SQL queries

Aryeh Gregor Simetrical+wikilist at gmail.com
Fri Nov 28 00:34:40 UTC 2008


On Thu, Nov 27, 2008 at 10:22 AM, Ilmari Karonen <nospam at vyznev.net> wrote:
> I was going to disagree with you, but it does seem there's something
> weird going on.  Compare these queries:
>
> mysql> SELECT page_namespace, page_title FROM page WHERE page_title LIKE
> '%fnord%' AND page_namespace=0;
> Empty set (4.63 sec)
>
> mysql> SELECT page_namespace, page_title, page_id FROM page WHERE
> page_title LIKE '%fnord%' AND page_namespace=0;
> Empty set (4.90 sec)
>
> mysql> SELECT page_namespace, page_title, page_is_redirect FROM page
> WHERE page_title LIKE '%fnord%' AND page_namespace=0;
> Empty set (30.56 sec)
>
> mysql> SELECT * FROM page WHERE page_title LIKE '%fnord%' AND
> page_namespace=0;
> Empty set (41.23 sec)
>
> This isn't just random variation either, but seems completely
> repeatable: including the page_is_redirect field (or, apparently, any
> field other than page_namespace, page_title or page_id) in the query,
> whether in the field list or in the WHERE clause, makes it run much more
> slowly.  WTF?

The EXPLAINs indicate what the difference is, although why it should
make a difference with no rows returned is a separate issue.

mysql> EXPLAIN SELECT page_namespace FROM page WHERE page_title LIKE
'%fnord%' AND page_namespace=0\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: ref
possible_keys: name_title
          key: name_title
      key_len: 4
          ref: const
         rows: 7863987
        Extra: Using where; Using index
1 row in set (0.01 sec)

mysql> EXPLAIN SELECT page_is_redirect FROM page WHERE page_title LIKE
 '%fnord%' AND page_namespace=0\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: ref
possible_keys: name_title
          key: name_title
      key_len: 4
          ref: const
         rows: 7863988
        Extra: Using where
1 row in set (0.00 sec)

"Using index" is the critical bit here.  The name_title index is on
(page_namespace, page_title); it also includes (page_id), the primary
key.  Therefore a query for any of those three columns can be
satisfied by examining the index alone, as long as the name_title
index is being used for retrieval.  Any other columns require
examination of the data.

The question is why this should make any difference when no rows are
returned.  If you have to do something so close to a table scan, the
best execution strategy might be to just read through the data and
ignore the index altogether.  So if MySQL determines that it can use
the index for the whole scan, it might use that; if it thinks it will
need to read the table data anyway, it might ignore the index and scan
the table itself, on the theory that this is going to be more
efficient if a large percentage of rows are returned.  The latter
strategy would turn out to be ineffective here, because no rows are
actually returned and the first strategy would have resulted in
scanning less data, but MySQL doesn't know that in advance.

The problem with that theory is that the EXPLAINs indicate identical
execution strategies.  If I were right, then the second query should
be of type ALL, not ref.  It could be EXPLAIN is lying, or that the
reason for the discrepancy is something different.  Running SHOW
PROFILE (http://dev.mysql.com/doc/refman/5.0/en/show-profiles.html) on
localhost for a similar query on an old simplewiki database, I
determine that the significant difference is in the "sending data"
phase of the query, at least on my machine with that database (the
effect is still visible, they're 0.04s vs. 0.14s or so).  That doesn't
tell me much, unfortunately.

The fact that the effect occurs on my local database indicates that
this particular difference has absolutely nothing to do with views,
though.



More information about the Toolserver-l mailing list