On Sat, Nov 21, 2009 at 8:45 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
Yeah, pretty much. If you had an index on (page_is_redirect, page_namespace, page_title) or such, that would speed it up significantly (at least in MySQL). It would have to scan through the whole index, but that only contains three columns plus a row id of some kind, so it should be quite a lot faster than scanning the whole table.
Quite a lot? The theoretical max would be about twice as fast, as (page_is_redirect, page_namespace, page_title) is going to take up at least half as much space as the whole page table. But I'm not sure even that theoretical max could be reached by MySQL.
I know it wouldn't be reached by PostgreSQL, which would still do a sequential scan through the table. If you clustered on page_is_redirect you'd save yourself from having to go through the parts of the table which were redirects, but you're still stuck with a sequential scan.
In either database, if you really wanted the absolute fastest solution, you'd create a materialized view for exactly that query. But as you said, "this isn't a likely query for optimization."
Yes, it can be very hard to switch your DBMS, and that's a very good thing for MySQL. :)
Let's not have a DBMS flame war here, please.
Aww, c'mon, just a little light ribbing... I couldn't resist, you set me right up for it.
I'm actually in the process of trying to import enwiki into a postgresql database right now. Attempt 1 was to import everything into a MySQL database (with no indexes), export it as a TSV file, then import from the TSV file into Postgresql. Hit a snag with some data that Postgres is saying isn't valid UTF8, which is probably due to something I did wrong with the import, but I can't figure out what it is.