On Sun, Nov 22, 2009 at 02:45, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
Yeah, pretty much. If you had an index on (page_is_redirect, page_namespace, page_title) or such, that would speed it up significantly (at least in MySQL). It would have to scan through the whole index, but that only contains three columns plus a row id of some kind, so it should be quite a lot faster than scanning the whole table. But this isn't a likely query for optimization.
That's correct. We run this query once, and then we do a SELECT for each title (which only takes a few millis, because it uses an index) and work with the result. Building the index would probably take longer than 5 or 10 minutes.
The DB access is fast compared to all the other stuff we do, so there's not much need to optimize it. MySQL with the indexes in tables.sql suits us fine so far, and I have no reason to suspect any other DB would be faster.
The one thing that is slow is builiding the indexes after the data has been imported (eight hours or so). Maybe we could omit some indexes that are not used in our application, but I haven't really looked into that.
Christopher