Thanks!
You are correct in assuming that this is designed for small-to-medium
sized wikis--we have about 1000 users a couple of hundred edits per day,
but our scale testing indicated it would handle at least 20-30 combined
queries/updates per second (of normal sized pages).
I'm assuming that you mean "single-host" from a indexing server point of
view, and yes, at this time that is completely correct. Article
indexing, however, can easily support multiple Mediawiki servers calling
it. Currently the attachment indexing relies on there only being a
single Mediawiki server as well, but that's an easy modification.
There is a preload mechanism that grabs the pages directly from the
database for indexing as well. At some point I intend to combine the
two, thereby keeping the real-time update but also providing a
background indexer in case the realtime feed fails for some reason
(therefore ensuring that no articles are missed). For us the latter's
not a big problem as we can reindex in about an hour.
Robert Stojnic wrote:
> Very nice work Chris!
> I think it strikes a good balance of simplicity and flexibility that
> makes is ideal for small-to-medium sites.
>
> The architecture itself seems to be similar to that used in early
> mwsearch, where index is updated via hooks that submit articles directly
> to indexer. So, it assumes single-host architecture and uses
> out-of-the-box lucene scoring and highlighting as far as i can see. I
> think the most interesting part for us is the handling of attachments. I
> see you use apache poi and pdfbox. We should really try to use this as
> well, this shouldn't be too hard to do, but needs a bit of fiddling..