[Toolserver-l] SQL queries

Marc A. Pelletier marc at uberbox.org
Thu Nov 27 15:55:04 UTC 2008


Ilmari Karonen wrote:
> Mashiah Davidson wrote:
>    
>> I've implemented a bot, which for Ruwiki solves the problem of
>> lonelypages and even isolated articles. The query you've just cited (an
>> analogue to be honest) is the key point there as well and anyway there
>> is a possibility to make it faster than one you cited on the TS. The
>> problem here is that we deal with views, not the db itself, so this
>> blocks further optimization somehow.
>>      
>
> I was going to disagree with you, but it does seem there's something
> weird going on.  Compare these queries:
>
> mysql>  SELECT page_namespace, page_title FROM page WHERE page_title LIKE
> '%fnord%' AND page_namespace=0;
> Empty set (4.63 sec)
>
> mysql>  SELECT page_namespace, page_title, page_id FROM page WHERE
> page_title LIKE '%fnord%' AND page_namespace=0;
> Empty set (4.90 sec)
>
> mysql>  SELECT page_namespace, page_title, page_is_redirect FROM page
> WHERE page_title LIKE '%fnord%' AND page_namespace=0;
> Empty set (30.56 sec)
>
> mysql>  SELECT * FROM page WHERE page_title LIKE '%fnord%' AND
> page_namespace=0;
> Empty set (41.23 sec)
>
> This isn't just random variation either, but seems completely
> repeatable: including the page_is_redirect field (or, apparently, any
> field other than page_namespace, page_title or page_id) in the query,
> whether in the field list or in the WHERE clause, makes it run much more
> slowly.  WTF?
>
>    
That behavior is consistent with lack of indices, or the wrong *type* of 
indices, on the culpable columns forcing a seq scan.  I should point out 
that it's not always a good idea to index less frequently selected 
columns because it increases the cost of insertions and updates 
significantly, so the solution isn't necessarily to add some either.  
Sometimes, if you are going to be doing a lot of selects on those 
columns, the creation of a temporary table with the needed index is, in 
fact, the most efficient solution despite the setup cost.

-- Marc




More information about the Toolserver-l mailing list