On Sat, Nov 21, 2009 at 7:02 PM, Anthony <wikimail(a)inbox.org> wrote:
You could build an index on page_is_redirect in the
"page" table (see
enwiki-*-page.sql.gz). But I'm pretty sure Postgresql wouldn't use
it, and would do a sequential scan, since pretty much all the pages
are going to have to be accessed anyway.
Five or ten minutes sounds about right. I can't imagine this is a
query you want to run over and over again. If it is, you'd probably
want to use partitioning
(
http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html),
but you're still not going to cut down the query time very much, as
it's going to be returning millions of rows.
Yeah, pretty much. If you had an index on (page_is_redirect,
page_namespace, page_title) or such, that would speed it up
significantly (at least in MySQL). It would have to scan through the
whole index, but that only contains three columns plus a row id of
some kind, so it should be quite a lot faster than scanning the whole
table. But this isn't a likely query for optimization.
Yes, it can be very hard to switch your DBMS, and
that's a very good
thing for MySQL. :)
Let's not have a DBMS flame war here, please.