Simetrical schreef:
Well, there have been fairly extensive discussions on this list about
implementation of category intersections. Software discussions are
done here.
MinuteElectron wrote a pretty good implementation of category
intersection as an extension [1]. The only downsides I see are:
* It uses the LinkUpdater to gradually build the categoryintersections
table, but there's no maintenance script to build the entire table at
once. I've written one today, but haven't figured out yet how to
properly integrate this into an extension (I can't really get the path
to the maintenance dir from there; are there any other extensions with
CLI scripts around?)
* It uses nested queries to intersect three or more categories, and it's
hard for me to judge how efficient they are. More about this later.
* It doesn't have a clean API to get a category intersection sub-query
(this could be written of course, and it should if we're gonna use it)
As to the subquery thing, I'll describe how the extension fetches pages
that are in categories A, B and C (all three of them). First, it
calculates hashes for A|B, A|C and B|C (will be called hashAB, hashBC
and hashAC respectively). Then, it queries the categoryintersections
table for pages that have all three hashes, as follows:
SELECT ci_page FROM categoryintersections
WHERE ci_hash = 'hashAB' AND ci_page IN (
SELECT ci_page FROM categoryintersections
WHERE ci_hash = 'hashBC' AND ci_page IN (
SELECT ci_page FROM categoryintersections
WHERE ci_hash = 'hashAC'
)
)
I ran an EXPLAIN on it, but I can't really judge if it's bad or good,
'cause I don't know how bad those dependent subqueries are:
id select_type table type
possible_keys
1 PRIMARY categoryintersections ref PRIMARY
2 DEPENDENT SUBQUERY categoryintersections eq_ref PRIMARY
3 DEPENDENT SUBQUERY categoryintersections eq_ref PRIMARY
id key key_len ref rows Extra
1 PRIMARY 4 const 2 Using where; Using index
2 PRIMARY 8 const,func 1 Using where; Using index
3 PRIMARY 8 const,func 1 Using where; Using index
For clarification, the structure of the categoryintersections table is
as follows:
|
CREATE TABLE `categoryintersections` (
`ci_page` int(10) unsigned NOT NULL,
`ci_hash` int(10) unsigned NOT NULL,
PRIMARY KEY (`ci_hash`,`ci_page`)
);
|
Can someone who knows more about database efficiency than I do comment
on this? Also, I'd like to suggest we merge this extension into core
(after improving it first), thoughts?
Roan Kattouw (Catrope)