Okay, I've got some better numbers (and a better method of building the category intersections table from an existing database).
I build a table with 2.6 million records, all the pages from the snapshot I had. It's 90% populated with categories (this will make more sense in a minute) and I'm still getting good results from intersections queries. For instance:
Showing rows 0 - 1 (2 total, Query took 0.3459 sec) SQL query: SELECT * FROM `pageindex` WHERE MATCH (catlist ) AGAINST ( '+Living_People +People_from_Maine +1956_births' IN BOOLEAN MODE ) LIMIT 0 , 30
(which has two results, btw, Cynthia McFadden and David Kelley).
This takes consistently a third of a second. Not bad. And that's for 2.6million rows. I think we should restrict it to just current articles in the main namespace, but that's my opinion.
I figured out a better way to populate the table, too:
UPDATE pageindex,categorylinks SET pageindex.catlist=CONCAT_WS(' ', pageindex.catlist,categorylinks.cl_to) WHERE ( pageindex.pageid=categorylinks.cl_from AND INSTR(pageindex.catlist, categorylinks.cl_to) =0)
You have to run this query for a number of times equal to the greatest number of distinct categories a page has. Most pages have less than 7, but a few apparently have outrageous numbers. After doing a lot of searching around for a solution to concatenating multiple rows into one value, I think this is pretty good.
Aerik