Message: 4
Date: Thu, 6 Mar 2008 21:16:09 +0000
From: "Magnus Manske" <magnusmanske(a)googlemail.com>
Subject: [Commons-l] Category intersection: New extension available
To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>rg>,
"Wikimedia Commons Discussion List"
<commons-l(a)lists.wikimedia.org>
Message-ID:
<fab0ecb70803061316x3963e31ag950b47803fe1864f(a)mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
(cross-posting wikitech-l and commons-l)
I created a new extension ("CategoryIntersection") that allows for
quick lookup of pages (and image) in intersecting categories. That
would enable wiki(m|p)edia sites to use categories as tags,
eliminating the need for oh-so-specialized categories.
Intersection of two categories works very fast, but intersecting more
categories is possible, and already implemented; the maximum number
can be limited.
I tried it on my (mostly empty) MediaWiki test setup, and it works
peachy. However, *I NEED HELP* with
* testing it on a large-scale installation
* integrating it with MediaWiki more tightly (database wrappers, caching, etc.)
* Brionizing the code, so it actually has a chance to be used on
Wikipedia and/or Commons
Techinical notes:
* This was recently discussed on wikitech-l
* More than two intersections are implemented by nesting subqueries
* Hash values are implemented as VARCHAR(32). Could easily switch to
INTEGER if desirable (less storage, faster lookup, but more false
positives)
* The hash values will only give good candidates (pages that *might*
intersect in these categories). The candidates have then to be checked
in a second run, which will have to be optimized; database people to
the front!
* Table to store hash values has to be created manually; SQL is in the main file
* I didn't implement code to fill the table for an existing
installation; however, since hash table updates solely hang on the
LinksUpdate hook, this should be easy
* There is no code covering page moves and deletions yet; do those
hang on LinksUpdate as well?
* SQL queries are currently "plain text" and not constructed through
the DB wrappers; I wan't sure how to do that for the subqueries
Cheers,
Magnus
You may want to spam the wikinews people about this. While I believe
we're are fairly happy with DPL+toolserver thingys for all are
category intersection
needs, I'm not 100% sure what you're talking about , so someone on
wikinews might be intrested in this.
Cheers,
bawolff