I was under the impression about Sphinx from a VERY superficial scan of their website. Didn't know there were non-mySQL options!
Which makes it an even harder choice, I guess. I'm leaning still toward Lucene, largely because others who are using it seem to be happy, and they seem to be telling me that the memory concern is not significant (again, this is from a too-quick scan of the docs on the Apache/Lucene website). I suspect they were talking about the size issue for indexing the whole internet, not just one wiki.
My thinking is partly based on the guess that because more wikis are using Lucene, and that since Wikipedia is using it, further development and improvement of the extension is likely to be better. But for all I know, everyone will switch to Sphinx by the time I educate myself sufficiently and find the free time to actually install Lucene! ; )
Jim
On Dec 6, 2007, at 9:57 AM, Samuel Lampa wrote:
Jim Hu wrote:
As I understand it, Lucene indexes and stores the indexes into a set of index files that are kept in memory or are swapped in as needed and does not use the backend database that's running the wiki. By contrast, Sphinx works via mySQL.
Regarding indexes, Sphinx can be set up to use either a MySQL backend or it's own data format, which is the standard mode. It might be though that the SphinxSearch extension ( http://www.mediawiki.org/wiki/Extension:SphinxSearch ) uses the wiki's database to get the article extracts for the search page, since these extracts are not in the indexes.
I have, btw, been impressed by Sphinx's indexing speed (something like 1000 pages / 6 sec) , as well as it's set of features and config options (mutli-language stemming et.c.), and think it looks very promising.
Regards Samuel
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054