Timwi wrote
You method would:
- retrieve all pages from category1
- retrieve all pages from category2
- sort the whole list
- group them (i.e. replace runs with their counts)
- filter out those where the count is 2
The straightfoward method would be:
select a.page from categorylinks a where a.category='category1' AND EXISTS ( select * from categorylinks b where a.page=b.page AND b.category='category2' )
This would:
- retrieve all the pages from category2
- find all the pages in category1 that are any of those
I don't have MySQL handy, but tested this on MSSQL and the second method was way faster. :)
Timwi
Hey, cool! I don't have my ego wrapped up in it or anything, so if that's better, great! I don't know what MySQL's internal processes are, so I can only speculate as to what the pros and cons are to each method (whether it actually does two queries, or just one, under what circumstances, etc... I have no idea).
Thanks for calling it "clever" - honestly, it was the only way I could think of to do it. I've seen it done (dynamic page lists) with a lot of joins, but it struck me that all those joins had to incur a lot of overhead. Maybe I'll write the three different implementations and run them head-to-head in MySQL. Would be interesting, anyway.
Best Regards, Aerik
wikitech-l@lists.wikimedia.org