2009/10/23 Aryeh Gregor Simetrical+wikilist@gmail.com:
On Fri, Oct 23, 2009 at 12:20 PM, Andrew Dunbar hippytrail@gmail.com wrote:
Yes I didn't specify tl_namespace
In MySQL that will usually make it impossible to effectively use an index on (tl_namespace, tl_title), so it's essential that you specify the NS. (Which you should anyway to avoid hitting things like [[Template talk:Infobox language]].) Some DBMSes (including sometimes MySQL >= 5.0, although apparently not here) are smart enough to use this kind of index pretty well even if you don't specify the namespace, but it would still be somewhat more efficient to specify it -- the DB would have to do O(1/n) times as many index lookups, where n is the number of namespaces.
and when I check for which columns have keys I could see none: mysql> describe templatelinks; +--------------+-----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+-----------------+------+-----+---------+-------+ | tl_from | int(8) unsigned | NO | | 0 | | | tl_namespace | int(11) | NO | | 0 | | | tl_title | varchar(255) | NO | | | | +--------------+-----------------+------+-----+---------+-------+ 3 rows in set (0.01 sec)
The toolserver database uses views. In MySQL, views can't have indexes themselves, but your query is rewritten to run against the real table -- which you can't access directly, but which does have indexes. EXPLAIN is your best bet here:
mysql> EXPLAIN SELECT tl_from FROM templatelinks WHERE tl_title IN ('Infobox_Language', 'Infobox_language'); +----+-------------+---------------+-------+---------------+---------+---------+------+-----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+-------+---------------+---------+---------+------+-----------+--------------------------+ | 1 | SIMPLE | templatelinks | index | NULL | tl_from | 265 | NULL | 149740990 | Using where; Using index | +----+-------------+---------------+-------+---------------+---------+---------+------+-----------+--------------------------+ 1 row in set (0.00 sec)
mysql> EXPLAIN SELECT tl_from FROM templatelinks WHERE tl_namespace=10 AND tl_title IN ('Infobox_Language', 'Infobox_language'); +----+-------------+---------------+-------+---------------+--------------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+-------+---------------+--------------+---------+------+------+--------------------------+ | 1 | SIMPLE | templatelinks | range | tl_namespace | tl_namespace | 261 | NULL | 6949 | Using where; Using index | +----+-------------+---------------+-------+---------------+--------------+---------+------+------+--------------------------+ 1 row in set (0.00 sec)
Note the number of rows scanned in each case. Your query was scanning all of templatelinks, the other is retrieving the exact rows needed and not looking at any others ("type" = "index" vs. "range"). The reason for this is given in the "possible_keys" column: MySQL can find no keys that are usable for lookup, if you omit tl_namespace.
Thanks for the very informative reply. I already knew most of this stuff passively except database/SQL views. Now I've just got to put it into more practice.
Andrew Dunbar (hippietrail)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l