Timwi wrote:
... I thought I had mentioned before (maybe not to you) that this query is unnecessarily inefficient (because it scans both categories completely). Why do you use this adventurous approach when there is a much simpler and more obvious one, which also happens to be faster?
SELECT c.cl_from FROM categorylinks c WHERE c.cl_from IN ( SELECT c2.cl_from FROM categorylinks c2 WHERE c2.cl_to='Fantasy_films' ) AND c.cl_to='Disney_films'
Showing rows 0 - 8 (9 total, Query took 0.9231 sec) <-- took longer than the count/group by query SQL query: SELECT c.cl_from FROM categorylinks c WHERE c.cl_from IN (
SELECT c2.cl_from FROM categorylinks c2 WHERE c2.cl_to = 'Fantasy_films' ) AND c.cl_to = 'Disney_films' LIMIT 0 , 30
I tried your way before, an a smaller data set, and got basically identical results for your way and mine. Trying your way on the larger dataset (copy of en), it appears it may be slower than mine. I think I recall that when you tested it, it was in MSSQL, and not MySQL. If the efficiency is equal, I like this approach becuase it scales seasily to intersections of more than two cateogories. It looks like the count/group by approach may be faster in MySQL, based on this example.
Aerik
wikitech-l@lists.wikimedia.org