On Sat, Nov 21, 2009 at 7:02 PM, Anthony wikimail@inbox.org wrote:
You could build an index on page_is_redirect in the "page" table (see enwiki-*-page.sql.gz). But I'm pretty sure Postgresql wouldn't use it, and would do a sequential scan, since pretty much all the pages are going to have to be accessed anyway.
Five or ten minutes sounds about right. I can't imagine this is a query you want to run over and over again. If it is, you'd probably want to use partitioning (http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html), but you're still not going to cut down the query time very much, as it's going to be returning millions of rows.
Yeah, pretty much. If you had an index on (page_is_redirect, page_namespace, page_title) or such, that would speed it up significantly (at least in MySQL). It would have to scan through the whole index, but that only contains three columns plus a row id of some kind, so it should be quite a lot faster than scanning the whole table. But this isn't a likely query for optimization.
Yes, it can be very hard to switch your DBMS, and that's a very good thing for MySQL. :)
Let's not have a DBMS flame war here, please.