I just had the following thought: For a tag intersection system,
* we limit queries to two intersections (show all pages with categories A and B) * we assume on average 5 categories per page (can someone check that?)
then we have 5*4=20 intersections per page. Now, for each intersection, we calculate MD5("A|B") to get an integer hash, and store that in a new table (page_id INTEGER,intersection_hash INTERGER). That table would be 4 times as long as the categorylinks table. * Memory usage: Acceptable (?) * Update: Fast, on page edit only * Works for non-existing categories
On a query, we look for the (indexed) hash in a subquery, then check those against the actual categorylinks. Looking up an integer in the subquery should be fast enough ;-) Given the number of categories and INTEGER >4bn, that would make the hash unique for all combinations of 65K categories (if the hash were truely randomly distributed, which it isn't), which should mean that the number of false positives (to be filtered by the main query) should be rather low.
If that's fast enough, we could even expand to three intersections (A, B, and C), querying "A|B", "A|C", and "B|C", and let PHP find the ones common to all three sets.
Summary: Fixing slow MySQL lookup by throwing memory at it...
Feasible? Or pipe dream? Magnus