On Fri, 29 Feb 2008 10:28:14 +0000, Magnus Manske wrote:
I just had the following thought: For a tag
intersection system,
* we limit queries to two intersections (show all pages with categories A and B)
* we assume on average 5 categories per page (can someone check that?)
then we have 5*4=20 intersections per page.
Now, for each intersection, we calculate MD5("A|B") to get an integer
hash, and store that in a new table (page_id INTEGER,intersection_hash
INTERGER).
That table would be 4 times as long as the categorylinks table.
* Memory usage: Acceptable (?)
* Update: Fast, on page edit only
* Works for non-existing categories
How important are intersections for non-existent categories? Without
we could have something like (page_id int, cat_intersect bigint) or
(page_id int, cat1 int, cat2 int) to get two cat intersection without
collisions; and maybe even scale up by defining n-intersections
recursively, without collisions.