Regarding Sphinx, I haven't used it, but I'm excited that there's a feasible non-java alternative to Lucene. Don't get me wrong, I like Lucene in principle - however many hosts don't permit Java apps (barring significant cost increase).
Ferret[1] may also work, though I don't know if anyone's attempted using it for MW search.
Is there any work being done on a native PHP alternative? Something which could conceivably plug into the ArticleSaveComplete hook and keep a text index up-to-date?
[1] http://ferret.davebalmain.com/trac/
-- Jim R. Wilson (jimbojw)
On 9/25/07, Rob Church robchur@gmail.com wrote:
On 25/09/2007, Maury Markowitz maury.markowitz@gmail.com wrote:
Out of curiosity, what is the reason we just don't use Google (or such)? The search capabilities in en.wiki are almost completely useless, IMHO, it can't even find articles with identical names if the capitalization is wrong, and it completely lacks anything like spell checking or reasonable relevance rankling. In order to find articles on en.wiki, I invariably open a second browser to search in, and this strikes me as rather sub-optimal. Is there anything I can do about this at "my end"?
I'm delighted to hear that all the excellent work that's been put into developing Lucene Search for Wikipedia as of late, as well as the ongoing work, is so easily characterised as pointless, or perhaps you haven't tried searching for anything lately - it's getting better.
We don't "just use Google" because we'd like our users to remain within the same site when searching it, to avoid confusion due to an inconsistent user experience, and the only means Google would provide us with to avoid that are either too expensive to justify, or proprietary, something we're desperately committed to avoiding.
Rob Church
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l