On Fri, Feb 29, 2008 at 2:50 PM, Steve Sanbeg ssanbeg@ask.com wrote:
How important are intersections for non-existent categories? Without we could have something like (page_id int, cat_intersect bigint) or (page_id int, cat1 int, cat2 int) to get two cat intersection without collisions; and maybe even scale up by defining n-intersections recursively, without collisions.
Maybe, except we don't have category id's. If we did, there would be no such thing as a nonexistent category, logically: there would be categories with no associated article pages, but they would still have category ID's. Unless you're proposing we use article id's, but currently categories do not need any article associated with them, and I'm not sure it's valuable to change that.
On Fri, Feb 29, 2008 at 2:59 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
How fast are ANDs in SELECT WHEREs? I would guess it's quicker to search by hash than by 2 ints.
It makes no difference, even if category id's existed (which they should, and sooner or later will). It's a sub-millisecond query either way. A B-tree index on (page, 32-bit cat1, 32-bit cat2) would have exactly the same cardinality as a B-tree index on (page, 64-bit hash), and values of the same length, so traversing them should take the same time, I'd imagine. (But I don't know how the storage works exactly for composite indexes, or anything about B-trees except the most basic things.)