On Sat, Nov 21, 2009 at 8:45 PM, Aryeh Gregor
<Simetrical+wikilist(a)gmail.com> wrote:
Yeah, pretty much. If you had an index on
(page_is_redirect,
page_namespace, page_title) or such, that would speed it up
significantly (at least in MySQL). It would have to scan through the
whole index, but that only contains three columns plus a row id of
some kind, so it should be quite a lot faster than scanning the whole
table.
Quite a lot? The theoretical max would be about twice as fast, as
(page_is_redirect, page_namespace, page_title) is going to take up at
least half as much space as the whole page table. But I'm not sure
even that theoretical max could be reached by MySQL.
I know it wouldn't be reached by PostgreSQL, which would still do a
sequential scan through the table. If you clustered on
page_is_redirect you'd save yourself from having to go through the
parts of the table which were redirects, but you're still stuck with a
sequential scan.
In either database, if you really wanted the absolute fastest
solution, you'd create a materialized view for exactly that query.
But as you said, "this isn't a likely query for optimization."
Yes, it can be
very hard to switch your DBMS, and that's a very good
thing for MySQL. :)
Let's not have a DBMS flame war here, please.
Aww, c'mon, just a little light ribbing... I couldn't resist, you set
me right up for it.
I'm actually in the process of trying to import enwiki into a
postgresql database right now. Attempt 1 was to import everything
into a MySQL database (with no indexes), export it as a TSV file, then
import from the TSV file into Postgresql. Hit a snag with some data
that Postgres is saying isn't valid UTF8, which is probably due to
something I did wrong with the import, but I can't figure out what it
is.