On 30/04/11 21:38, MZMcBride wrote:
Where's the best documentation for the search
setup? And are there any pages
If you by setup you mean the setup WMF is using then . If you by
setup you mean how we use Lucene (with some historical context) then 
and  are a good starting point. Apart from that, it's reading the
comments in the code.
with a roadmap for future development?
The roadmap is pretty much solving the bugs reported in bugzilla for the
lucene-search extension. There is quite a few of them, but most of them
are of technical nature.
Any further improvements in the *quality* of search results would
require employing someone who specialises in natural language
processing/data mining/search to improve on the existing algorithms. The
algorithms we currently use are pretty much the-state-of-the-art in the
opensource world, and I would consider any further improvement as proper
I'm particularly curious if the Java component
can't be killed.
I would doubt it. It isn't the case that we simply use Lucene
out-of-the-box and that we could switch to another port. In fact, the
backend search extension (lucene-search) is pretty big with some 50k
lines of code. It implements a couple of algorithms I put together to
work with the way how information is structured on Wikipedia, in
languages I speak.