Simetrical schreef:
Well, there have been fairly extensive discussions on this list about implementation of category intersections. Software discussions are done here.
MinuteElectron wrote a pretty good implementation of category intersection as an extension [1]. The only downsides I see are: * It uses the LinkUpdater to gradually build the categoryintersections table, but there's no maintenance script to build the entire table at once. I've written one today, but haven't figured out yet how to properly integrate this into an extension (I can't really get the path to the maintenance dir from there; are there any other extensions with CLI scripts around?) * It uses nested queries to intersect three or more categories, and it's hard for me to judge how efficient they are. More about this later. * It doesn't have a clean API to get a category intersection sub-query (this could be written of course, and it should if we're gonna use it)
As to the subquery thing, I'll describe how the extension fetches pages that are in categories A, B and C (all three of them). First, it calculates hashes for A|B, A|C and B|C (will be called hashAB, hashBC and hashAC respectively). Then, it queries the categoryintersections table for pages that have all three hashes, as follows:
SELECT ci_page FROM categoryintersections WHERE ci_hash = 'hashAB' AND ci_page IN ( SELECT ci_page FROM categoryintersections WHERE ci_hash = 'hashBC' AND ci_page IN ( SELECT ci_page FROM categoryintersections WHERE ci_hash = 'hashAC' ) )
I ran an EXPLAIN on it, but I can't really judge if it's bad or good, 'cause I don't know how bad those dependent subqueries are:
id select_type table type possible_keys 1 PRIMARY categoryintersections ref PRIMARY 2 DEPENDENT SUBQUERY categoryintersections eq_ref PRIMARY 3 DEPENDENT SUBQUERY categoryintersections eq_ref PRIMARY
id key key_len ref rows Extra 1 PRIMARY 4 const 2 Using where; Using index 2 PRIMARY 8 const,func 1 Using where; Using index 3 PRIMARY 8 const,func 1 Using where; Using index
For clarification, the structure of the categoryintersections table is as follows: | CREATE TABLE `categoryintersections` ( `ci_page` int(10) unsigned NOT NULL, `ci_hash` int(10) unsigned NOT NULL, PRIMARY KEY (`ci_hash`,`ci_page`) ); | Can someone who knows more about database efficiency than I do comment on this? Also, I'd like to suggest we merge this extension into core (after improving it first), thoughts?
Roan Kattouw (Catrope)