Christophe Henner wrote:
>
> Hi
> Still about Category search, I looked for something about it but
> didn't found, what about something making possible to have all the
> article matching x categorys.
> For exemple giving the list of all the articles both in
> [[Category:Writer]] and [[Category:Born in London]].
> Have a nice day
> --
> schiste
That's the much discussed and desired "Category Intersections" and is a
tricker problem (at scale) than I thought. I've been testing some ideas on
and off, but have been slowed down due to having a hard time clearing the
query cache on the server I'm using (I've tried "FLUSH QUERY CACHE" and
"RESET QUERY CACHE" but they don't seem to actually do it - all you MySQL
gurus out there, what am I missing?).
I've got two ideas I want to test:
1) use the existing table and the query I've previously suggested, but
constructed it smarter by considering the number of pages in each category -
in other words, purposefully narrow down he result set as early as possible
(like look at "People born in 1912" and then see how many of those are
"Living People" instead of the other way around).
2) Try building a table with a fulltext index using a record for each page,
and a column for the categories, delimited by spaces (use underscores for
spaces in a category name). This may be a bit hackish, but I'm thinking
this will get MySQL to do the tricky part of building the index on
categories (each being a word in that column) for me. The MySQL people
must've made the fulltext index code as efficient as possible, so it will be
interesting to see how it performs. I know full text indexing is not
acceptable for whole Wikipedia articles, but if we're only considering
categories, we're talking about a lot less text. I've been wondering if
maybe this is how Flickr handles tags - whatever they're doing, the
functionality seems to match what we want to do, and at a large scale, too.
If neither of these work, then I think we're off into either Lucene or some
other search function with a custom index/data structure. But I have the
strong impression that those are pretty inherently not updated in real-time,
which is a bummer.
Best Regards,
Aerik