Robert Stojnic schreef:
Let me briefly repeat what I said earlier about my experience with this
category
intersection thingy. Adding categories to lucene index is easy *IF* they
are inside
the article, e.g. try this:
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%2Bin…
This will give you category intersection of "Living People" and "English
comedy writers"
in fraction of the second.
What I found that the hard part is keeping the index updated. If we want
a fancy category
intersection system discussed here before we need to have an index that
is frequently updated,
that will be integrated with the job queue, that will understand
templates etc..
You don't need the article text, just query the categorylinks table.
Lucene is not that good with very frequent updates.
The usual setting is
to have an indexer,
make snapshots of the index at regular intervals and then rsync it onto
searchers. The whole
process takes time, although for a category-only index it will probably
be fast. I assume there
would be at least few tens of minutes lag anyhow. Our current lucene
framework could
easily be used for index distribution and such.
Categories don't change that often, so I don't think 10 minutes of lag
is that bad.
What remains unsolved, however, is keeping the index
updated with the
latest changes
on the site. If one changes a template with a category in it, the thing
goes on the job queue.
I assume there would need to be some kind of hook that will either log
the change somewhere
or send data to lucene somehow. This is the part of the backend that
needs thinking and solving.
There's the LinksUpdate hook, which is also used in Magnus's
implementation.
Roan Kattouw (Catrope)