Mashiah Davidson wrote:
I've implemented a bot, which for Ruwiki
solves the problem of
lonelypages and even isolated articles. The query you've just cited (an
analogue to be honest) is the key point there as well and anyway there
is a possibility to make it faster than one you cited on the TS. The
problem here is that we deal with views, not the db itself, so this
blocks further optimization somehow.
I was going to disagree with you, but it does seem there's something
weird going on. Compare these queries:
mysql> SELECT page_namespace, page_title FROM page WHERE page_title LIKE
'%fnord%' AND page_namespace=0;
Empty set (4.63 sec)
mysql> SELECT page_namespace, page_title, page_id FROM page WHERE
page_title LIKE '%fnord%' AND page_namespace=0;
Empty set (4.90 sec)
mysql> SELECT page_namespace, page_title, page_is_redirect FROM page
WHERE page_title LIKE '%fnord%' AND page_namespace=0;
Empty set (30.56 sec)
mysql> SELECT * FROM page WHERE page_title LIKE '%fnord%' AND
page_namespace=0;
Empty set (41.23 sec)
This isn't just random variation either, but seems completely
repeatable: including the page_is_redirect field (or, apparently, any
field other than page_namespace, page_title or page_id) in the query,
whether in the field list or in the WHERE clause, makes it run much more
slowly. WTF?
That behavior is consistent with lack of indices, or the wrong *type* of
indices, on the culpable columns forcing a seq scan. I should point out
that it's not always a good idea to index less frequently selected
columns because it increases the cost of insertions and updates
significantly, so the solution isn't necessarily to add some either.
Sometimes, if you are going to be doing a lot of selects on those
columns, the creation of a temporary table with the needed index is, in
fact, the most efficient solution despite the setup cost.
-- Marc