Re: [Wikitech-l] Wikitech-l Digest, Vol 37, Issue 96

24 Aug 2006

      Timwi wrote
...
You method would:
...

retrieve all pages from category1
retrieve all pages from category2
sort the whole list
group them (i.e. replace runs with their counts)
filter out those where the count is 2

The straightfoward method would be:
select a.page from categorylinks a
where a.category='category1' AND EXISTS (
        select * from categorylinks b
        where a.page=b.page AND b.category='category2'
)
This would:

retrieve all the pages from category2
find all the pages in category1 that are any of those

I don't have MySQL handy, but tested this on MSSQL and the second method
was way faster. :)
Timwi
Just out of curiousity, I tried it in MySQL with real data (25,000 rows of
categorylinks - not wikipedia, but something).  I cycle through each type of
query 100 times, and did no other processing besides running the loop and
executing the query.  Each way took about 20 seconds for the 100 loops.  I
also ran a control query ("SELECT * FROM categorylinks LIMIT 1") and it took
lik .05 seconds, and then .014 or something - it must be caching the query.
I didn't try it with the joins yet.  So it looks like MySQL handles it
differently than MSSQL.  I wonder if there's a better way than both of these
(maybe using the joins?  I'll have to try it.).
Best Regards,
Aerik

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Wikitech-l Digest, Vol 37, Issue 96