Chris Reigrut wrote:
Thanks!
You are correct in assuming that this is designed for small-to-medium sized wikis--we have about 1000 users a couple of hundred edits per day, but our scale testing indicated it would handle at least 20-30 combined queries/updates per second (of normal sized pages).
Yes, I am pretty sure that would be enough for pretty much any mediawiki site except the few largest...
I'm assuming that you mean "single-host" from a indexing server point of view, and yes, at this time that is completely correct. Article indexing, however, can easily support multiple Mediawiki servers calling it. Currently the attachment indexing relies on there only being a single Mediawiki server as well, but that's an easy modification.
Agreed. However, the real-time-update can only work with a single-host setup, since if one wants to have multiple searchers one needs to have some sort of replication, which raises all kinds of issues, like how frequently, optimized or not, how to deal with index warmup and hotswaps, synchronization overhead and such...
There is a preload mechanism that grabs the pages directly from the database for indexing as well. At some point I intend to combine the two, thereby keeping the real-time update but also providing a background indexer in case the realtime feed fails for some reason (therefore ensuring that no articles are missed). For us the latter's not a big problem as we can reindex in about an hour.
One other problem people had with lucene-search is that it eats a lot of resources... lucene-search can easily use couple of gigs of RAM just for the java process because of all of different caches and stuff. I was wondering if it was possible to make a lightweight server that would nicely work e.g. on an oldish machine with 128mb of ram or on shared hosting?
R.