Simetrical and Timwi wrote:
> > Besides, I think you're all forgetting that if we have a table that
> > stores, say, all two-way category intersections, we can actually get rid
> > of the categorylinks table itself -- it would be contained within that
> > new table and would be wholly redundant. Similarly, a table with all
> > three-way intersections contains in it all two-way intersections as well.
>
> Yeah, but then surely you'd have to take the union of a potentially
> large number of tables to display a single-category view, which I
> suspect is going to remain a more common request than a
> category-intersection view. Isn't that going to give you a
> substantial performance hit for large categories?
>
> (And incidentally, pages that are only in a single category will have
> to remain in their own table. You can't get that from intersection
> tables.)
>
Whoa - intersection "tables"? Why more than one? You lost me. Also,
I figured you could get higher order intersections from lower ones,
but how do you go the other way? And, I second the question about the
performance hit, even if you can.
I was picturing adding the intersections to the existing table - like
this, suppose you have an article in "Living people", "People from
Maine", "American horror writers" - it would also get the category
intersections of "Living people ~ People from Maine", "People from
Maine ~ American horror writers", etc. You would not need "American
horror writers ~ People from Maine" because it is the same
intersection. So I can see how you could intersect "Living people"
and "People from Maine ~ American horror writers" and get the
intersection of all three - but you would still need to store "Living
people ~ People from Maine" AND "People from Maine ~ American horror
writers" - it doesn't save you from having to store every combination
of 2 categories.
The plus side is this lets you calculate the intersection of 4
categories dynamically, or any 3, or any 2, with only storing the a
binomial coefficient (n choose k, where n is the number of categories
and k=2) additional records, and the retrieval of articles at the
intersection of only two categories will be fast. The logic for what
to retrievee and store gets a little convoluted, but not too bad... is
the performance improvement worth it?
I don't know enough about SQL to talk intelligently about the speed of
retrieving one record in two thousand vs. two million, but I agree,
some testing is called for here. If someone can point me where I can
download the categorylinks table (*only* preferably) I'll get it and
throw some sql at it. It'll take me a couple of days to get around
to.
Best Regards,
Aerik