[Commons-l] Category intersection: New extension available

Magnus Manske magnusmanske at googlemail.com
Thu Mar 6 21:16:09 UTC 2008


(cross-posting wikitech-l and commons-l)

I created a new extension ("CategoryIntersection") that allows for
quick lookup of pages (and image) in intersecting categories. That
would enable wiki(m|p)edia sites to use categories as tags,
eliminating the need for oh-so-specialized categories.

Intersection of two categories works very fast, but intersecting more
categories is possible, and already implemented; the maximum number
can be limited.

I tried it on my (mostly empty) MediaWiki test setup, and it works
peachy. However, *I NEED HELP* with
* testing it on a large-scale installation
* integrating it with MediaWiki more tightly (database wrappers, caching, etc.)
* Brionizing the code, so it actually has a chance to be used on
Wikipedia and/or Commons


Techinical notes:
* This was recently discussed on wikitech-l
* More than two intersections are implemented by nesting subqueries
* Hash values are implemented as VARCHAR(32). Could easily switch to
INTEGER if desirable (less storage, faster lookup, but more false
positives)
* The hash values will only give good candidates (pages that *might*
intersect in these categories). The candidates have then to be checked
in a second run, which will have to be optimized; database people to
the front!
* Table to store hash values has to be created manually; SQL is in the main file
* I didn't implement code to fill the table for an existing
installation; however, since hash table updates solely hang on the
LinksUpdate hook, this should be easy
* There is no code covering page moves and deletions yet; do those
hang on LinksUpdate as well?
* SQL queries are currently "plain text" and not constructed through
the DB wrappers; I wan't sure how to do that for the subqueries

Cheers,
Magnus



More information about the Commons-l mailing list