[Commons-l] Category intersection: New extension available

bawolff bawolff+wn at gmail.com
Mon Mar 10 05:49:43 UTC 2008

> Message: 4
> Date: Thu, 6 Mar 2008 21:16:09 +0000
> From: "Magnus Manske" <magnusmanske at googlemail.com>
> Subject: [Commons-l] Category intersection: New extension available
> To: "Wikimedia developers" <wikitech-l at lists.wikimedia.org>,
>       "Wikimedia Commons Discussion List" <commons-l at lists.wikimedia.org>
> Message-ID:
>       <fab0ecb70803061316x3963e31ag950b47803fe1864f at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> (cross-posting wikitech-l and commons-l)
> I created a new extension ("CategoryIntersection") that allows for
> quick lookup of pages (and image) in intersecting categories. That
> would enable wiki(m|p)edia sites to use categories as tags,
> eliminating the need for oh-so-specialized categories.
> Intersection of two categories works very fast, but intersecting more
> categories is possible, and already implemented; the maximum number
> can be limited.
> I tried it on my (mostly empty) MediaWiki test setup, and it works
> peachy. However, *I NEED HELP* with
> * testing it on a large-scale installation
> * integrating it with MediaWiki more tightly (database wrappers, caching, etc.)
> * Brionizing the code, so it actually has a chance to be used on
> Wikipedia and/or Commons
> Techinical notes:
> * This was recently discussed on wikitech-l
> * More than two intersections are implemented by nesting subqueries
> * Hash values are implemented as VARCHAR(32). Could easily switch to
> INTEGER if desirable (less storage, faster lookup, but more false
> positives)
> * The hash values will only give good candidates (pages that *might*
> intersect in these categories). The candidates have then to be checked
> in a second run, which will have to be optimized; database people to
> the front!
> * Table to store hash values has to be created manually; SQL is in the main file
> * I didn't implement code to fill the table for an existing
> installation; however, since hash table updates solely hang on the
> LinksUpdate hook, this should be easy
> * There is no code covering page moves and deletions yet; do those
> hang on LinksUpdate as well?
> * SQL queries are currently "plain text" and not constructed through
> the DB wrappers; I wan't sure how to do that for the subqueries
> Cheers,
> Magnus

You may want to spam the wikinews people about this. While I believe
we're are fairly happy with DPL+toolserver thingys for all are
category intersection
needs, I'm not 100% sure what you're talking about , so someone on
wikinews might be intrested in this.


More information about the Commons-l mailing list