On Tue, Dec 2, 2008 at 7:01 AM, Magnus Manske magnusmanske@googlemail.com wrote: [snip]
Articles on en.wikipedia with "1905 births" and "1967 deaths" took <0.4 sec. OTOH, looking for images on Commons in "GFDL" and "Buildings in Berlin" took ~2min. Might be the giant GFDL category, or the toolserver, or both. I'll try to fiddle with it some more utilising cat_pages/cat_files.
No. Bleh. The horrible slowness in your results is a result of broken methodology. (2 seconds is unacceptably slow by a factor of 10x, as far as I'm concerned)
Please see: https://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-September/026715.h...
If you go around blaming big categories I will be forced hunt you down and kill you. The constant mindset of "big categories = slow" results in people building pre-made intersections to reduce category sizes rather than using atomic categories. We can make big categories blindingly fast, but we simply can not make the recursion needed to sensible outcomes on pre-made intersections fast.
I had a tool on on toolserver that gave a HTML and JSON interfaces for doing queries against your choice of enwp or commons, ... the worst case results I could get out of it were on the order of ~30ms when using up to 10 categories. I didn't bother to maintain it because I mostly got complaints that it was not useful because it didn't find most things because it couldn't walk the category tree.