Ilmari Karonen wrote:
Mashiah Davidson wrote:
I've implemented a bot, which for Ruwiki solves the problem of lonelypages and even isolated articles. The query you've just cited (an analogue to be honest) is the key point there as well and anyway there is a possibility to make it faster than one you cited on the TS. The problem here is that we deal with views, not the db itself, so this blocks further optimization somehow.
I was going to disagree with you, but it does seem there's something weird going on. Compare these queries:
mysql> SELECT page_namespace, page_title FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0; Empty set (4.63 sec)
mysql> SELECT page_namespace, page_title, page_id FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0; Empty set (4.90 sec)
mysql> SELECT page_namespace, page_title, page_is_redirect FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0; Empty set (30.56 sec)
mysql> SELECT * FROM page WHERE page_title LIKE '%fnord%' AND page_namespace=0; Empty set (41.23 sec)
This isn't just random variation either, but seems completely repeatable: including the page_is_redirect field (or, apparently, any field other than page_namespace, page_title or page_id) in the query, whether in the field list or in the WHERE clause, makes it run much more slowly. WTF?
That behavior is consistent with lack of indices, or the wrong *type* of indices, on the culpable columns forcing a seq scan. I should point out that it's not always a good idea to index less frequently selected columns because it increases the cost of insertions and updates significantly, so the solution isn't necessarily to add some either. Sometimes, if you are going to be doing a lot of selects on those columns, the creation of a temporary table with the needed index is, in fact, the most efficient solution despite the setup cost.
-- Marc