I've been thinking about moving from the default to Lucene, and am NOT an expert, so take the following with lots of NaCl. I'd like to hear what people who know what they're talking about think!
As I understand it, Lucene indexes and stores the indexes into a set of index files that are kept in memory or are swapped in as needed and does not use the backend database that's running the wiki. By contrast, Sphinx works via mySQL. I believe that this difference can be important as the size and use of the wiki increases, since the search can end up taxing the db leading to performance degradation for mySQL. But if Lucene sucks up all your free memory, you could get performance problems outside mySQL. This is probably not an issue for your setup behind a firewall, but I'm wondering how to think about the tradeoffs for a smallish single-server wiki that sometimes gets swamped by search engine hits. And yes, I know that I need to learn more about robots.txt too...
Google also sells search appliances, in case you really want it to exactly like Google. ;)
Jim
On Dec 5, 2007, at 1:57 PM, Emufarmers Sangly wrote:
On Dec 5, 2007 10:23 AM, Jonathan Nowacki jnowacki@gmail.com wrote:
I have a mediawiki based resourced that needs a full text search engine. Google will not work as it is not yet a public resource. Anyone have any recommendations? This is intended to be used at an academic institution.
Lucene http://www.mediawiki.org/wiki/Extension:LuceneSearch is what Wikipedia uses. You might also want to take a look at Sphinx < http://www.mediawiki.org/wiki/Extension:SphinxSearch%3E.
-- Arr, ye emus, http://emufarmers.com _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054