On Tue, Dec 2, 2008 at 7:01 AM, Magnus Manske
<magnusmanske(a)googlemail.com> wrote:
[snip]
Articles on en.wikipedia with "1905 births"
and "1967 deaths" took <0.4 sec.
OTOH, looking for images on Commons in "GFDL" and "Buildings in
Berlin" took ~2min. Might be the giant GFDL category, or the
toolserver, or both. I'll try to fiddle with it some more utilising
cat_pages/cat_files.
No. Bleh. The horrible slowness in your results is a result of broken
methodology. (2 seconds is unacceptably slow by a factor of 10x, as
far as I'm concerned)
Please see:
https://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-September/026715.…
If you go around blaming big categories I will be forced hunt you down
and kill you. The constant mindset of "big categories = slow" results
in people building pre-made intersections to reduce category sizes
rather than using atomic categories. We can make big categories
blindingly fast, but we simply can not make the recursion needed to
sensible outcomes on pre-made intersections fast.
I had a tool on on toolserver that gave a HTML and JSON interfaces for
doing queries against your choice of enwp or commons, ... the worst
case results I could get out of it were on the order of ~30ms when
using up to 10 categories. I didn't bother to maintain it because I
mostly got complaints that it was not useful because it didn't find
most things because it couldn't walk the category tree.