[Mediawiki-l] Announcing Easy Lucene Search for Mediawiki (Robert Stojnic)

Chris Reigrut chris at reigrut.net
Thu May 7 22:03:58 UTC 2009


Thanks!

You are correct in assuming that this is designed for small-to-medium 
sized wikis--we have about 1000 users a couple of hundred edits per day, 
but our scale testing indicated it would handle at least 20-30 combined 
queries/updates per second (of normal sized pages). 

I'm assuming that you mean "single-host" from a indexing server point of 
view, and yes, at this time that is completely correct.  Article 
indexing, however, can easily support multiple Mediawiki servers calling 
it.  Currently the attachment indexing relies on there only being a 
single Mediawiki server as well, but that's an easy modification.

There is a preload mechanism that grabs the pages directly from the 
database for indexing as well.  At some point I intend to combine the 
two, thereby keeping the real-time update but also providing a 
background indexer in case the realtime feed fails for some reason 
(therefore ensuring that no articles are missed).  For us the latter's 
not a big problem as we can reindex in about an hour.


Robert Stojnic wrote:
> Very nice work Chris!
> I think it strikes a good balance of simplicity and flexibility that 
> makes is ideal for small-to-medium sites.
>
> The architecture itself seems to be similar to that used in early 
> mwsearch, where index is updated via hooks that submit articles directly 
> to indexer. So, it assumes single-host architecture and uses 
> out-of-the-box lucene scoring and highlighting as far as i can see. I 
> think the most interesting part for us is the handling of attachments. I 
> see you use apache poi and pdfbox. We should really try to use this as 
> well, this shouldn't be too hard to do, but needs a bit of fiddling..




More information about the MediaWiki-l mailing list