Timwi wrote
>
> You method would:
> > * retrieve all pages from category1
> > * retrieve all pages from category2
> > * sort the whole list
> > * group them (i.e. replace runs with their counts)
> > * filter out those where the count is 2
> >
> > The straightfoward method would be:
> >
> > select a.page from categorylinks a
> > where a.category='category1' AND EXISTS (
> > select * from categorylinks b
> > where a.page=b.page AND b.category='category2'
> > )
> >
> > This would:
> > * retrieve all the pages from category2
> > * find all the pages in category1 that are any of those
> >
> > I don't have MySQL handy, but tested this on MSSQL and the second method
> > was way faster. :)
> >
> > Timwi
> >
Just out of curiousity, I tried it in MySQL with real data (25,000 rows of
categorylinks - not wikipedia, but something). I cycle through each type of
query 100 times, and did no other processing besides running the loop and
executing the query. Each way took about 20 seconds for the 100 loops. I
also ran a control query ("SELECT * FROM categorylinks LIMIT 1") and it took
lik .05 seconds, and then .014 or something - it must be caching the query.
I didn't try it with the joins yet. So it looks like MySQL handles it
differently than MSSQL. I wonder if there's a better way than both of these
(maybe using the joins? I'll have to try it.).
Best Regards,
Aerik