On 9/14/06, Timwi timwi@gmx.net wrote:
The reason we think the SQL is too inefficient is because there can be large numbers of articles in one category. If we store all intersections of two categories, which is easy to do and doesn't take much space at all, then each intersection will likely be small enough to make the SQL for three-way intersections economical, so higher-order intersections need not be stored directly.
Well, then, is anyone up for writing and benchmarking this idea? :)
Besides, I think you're all forgetting that if we have a table that stores, say, all two-way category intersections, we can actually get rid of the categorylinks table itself -- it would be contained within that new table and would be wholly redundant. Similarly, a table with all three-way intersections contains in it all two-way intersections as well.
Yeah, but then surely you'd have to take the union of a potentially large number of tables to display a single-category view, which I suspect is going to remain a more common request than a category-intersection view. Isn't that going to give you a substantial performance hit for large categories?
(And incidentally, pages that are only in a single category will have to remain in their own table. You can't get that from intersection tables.)
Lastly, you're also forgetting that the size of the table is irrelevant as long as the hardware to store it is available.
Not quite irrelevant. It becomes more expensive to move it around, download it, and so on. A factor of 1.5 or 2, who cares, but a factor of five or ten could be sort of inconvenient.
More to the point: if you're maintaining many tables in parallel like this, you need that many more DB calls. If I add a category to a page with a dozen categories, it will need to update the dozen intersection tables instead of one categorylinks table. That's why I suggested only maintaining some of the most commonly-requested intersection tables, rather than all of them; there's a tradeoff here between time for reads and writes. Benchmarking is needed to know which side to lean toward more.