Re: [Wikitech-l] Category Intersections

15 Sep 2006


      Simetrical and Timwi wrote:
...
...
Besides, I think you're all forgetting that if we have a table that
stores, say, all two-way category intersections, we can actually get rid
of the categorylinks table itself -- it would be contained within that
new table and would be wholly redundant. Similarly, a table with all
three-way intersections contains in it all two-way intersections as well.
Yeah, but then surely you'd have to take the union of a potentially
large number of tables to display a single-category view, which I
suspect is going to remain a more common request than a
category-intersection view.  Isn't that going to give you a
substantial performance hit for large categories?
(And incidentally, pages that are only in a single category will have
to remain in their own table.  You can't get that from intersection
tables.)
Whoa - intersection "tables"?  Why more than one?  You lost me.  Also,
I figured you could get higher order intersections from lower ones,
but how do you go the other way?  And, I second the question about the
performance hit, even if you can.
I was picturing adding the intersections to the existing table - like
this, suppose you have an article in "Living people", "People from
Maine", "American horror writers" - it would also get the category
intersections of "Living people ~ People from Maine", "People from
Maine ~ American horror writers", etc.  You would not need "American
horror writers ~ People from Maine" because it is the same
intersection.  So I can see how you could intersect "Living people"
and  "People from Maine ~ American horror writers" and get the
intersection of all three - but you would still need to store "Living
people ~ People from Maine" AND "People from Maine ~ American horror
writers" - it doesn't save you from having to store every combination
of 2 categories.
The plus side is this lets you calculate the intersection of 4
categories dynamically, or any 3, or any 2, with only storing the a
binomial coefficient (n choose k, where n is the number of categories
and k=2) additional records, and the retrieval of articles at the
intersection of only two categories will be fast.  The logic for what
to retrievee and store gets a little convoluted, but not too bad... is
the performance improvement worth it?
I don't know enough about SQL to talk intelligently about the speed of
retrieving one record in two thousand vs. two million, but I agree,
some testing is called for here.  If someone can point me where I can
download the categorylinks table (*only* preferably) I'll get it and
throw some sql at it.  It'll take me a couple of days to get around
to.
Best Regards,
Aerik

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Category Intersections