On Friday 19 September 2008 05:00:19 am Maarten Dammers wrote:
Daniel Schwen schreef:
It is currently almost impossible to harvest information from the category tree. Database requests to find common super-categories and perform boolean operations are not feasible, since they run on the order of minutes.
Almost impossible? Take a look at http://toolserver.org/~multichill/filtercats.php
Unfortunately you are comparing Apples to Apple Orchards. And maybe I wasn't making myself clear enough. Finding common super-categories is actually not the biggest problem. That direction branches less then gowing _down_ in the tree.
Anyhow, I have been working with categories before. Check out this tool for the english Wikipedia: http://toolserver.org/~dschwen/intersection/
Let's say you want an intersection of all "Actors" in "Germany". Just intersecting those two categories won't cut it, thanks to the myriads of sub categories. The tool has to deep index several levels of subcategories and include their content in the intersection. Try it! The default depth of 2 levels won't be enough in most cases, and the processing time goes up exponentially with level.
In short, multichills super-category lookup works for one image. To perform a propper category intersection You'd have perform a super-category lookup for _every_ image on commons for each intersection!