Brion--
I did try to work with the existing Lucene search at the time, but I found it too complex and requiring too many other moving parts, especially when trying to implement the real-time portions which was a real requirement for us. I'd actually sent you an email back then to chat about it, but it must have gotten lost.
I'd love to work with you guys on it, though, if we can find a common ground.
Brion Vibber wrote:
El 5/7/09 12:33 PM, Chris Reigrut escribi?:
I'd like to announce the first release of EzMwLucene. This project provides a simplified Lucene search to Mediawiki. It is designed to be easy to install, configure, and run. It provides real-time, multiple field indexing and searching as well as text indexing of standard attachment types (pdf, xls, doc, ppt, vsd). The server is a self contained Java application (no application server needed), and the client portion is a standard Mediawiki extension. It is currently in production on an internal site with over 1000 users running on Mediawiki 1.13.
Sounds neat! Indexing of uploaded document contents would be very useful for some sites, and it would be great to have better real-time index update options for those sites with low enough traffic to handle it.
https://sourceforge.net/projects/ezmwlucene/
I welcome all feedback: questions, suggestions and offers to help improve it!
Any interest in helping to roll the new features into the existing Lucene search that's already actively maintained for Wikimedia? This would maximize availability of new functionality and ensure it's used and maintained in the future.
-- brion
Very nice work Chris! I think it strikes a good balance of simplicity and flexibility that makes is ideal for small-to-medium sites.
The architecture itself seems to be similar to that used in early mwsearch, where index is updated via hooks that submit articles directly to indexer. So, it assumes single-host architecture and uses out-of-the-box lucene scoring and highlighting as far as i can see. I think the most interesting part for us is the handling of attachments. I see you use apache poi and pdfbox. We should really try to use this as well, this shouldn't be too hard to do, but needs a bit of fiddling..
r.
Chris Reigrut wrote:
Brion--
I did try to work with the existing Lucene search at the time, but I found it too complex and requiring too many other moving parts, especially when trying to implement the real-time portions which was a real requirement for us. I'd actually sent you an email back then to chat about it, but it must have gotten lost.
I'd love to work with you guys on it, though, if we can find a common ground.
Brion Vibber wrote:
El 5/7/09 12:33 PM, Chris Reigrut escribi?:
I'd like to announce the first release of EzMwLucene. This project provides a simplified Lucene search to Mediawiki. It is designed to be easy to install, configure, and run. It provides real-time, multiple field indexing and searching as well as text indexing of standard attachment types (pdf, xls, doc, ppt, vsd). The server is a self contained Java application (no application server needed), and the client portion is a standard Mediawiki extension. It is currently in production on an internal site with over 1000 users running on Mediawiki 1.13.
Sounds neat! Indexing of uploaded document contents would be very useful for some sites, and it would be great to have better real-time index update options for those sites with low enough traffic to handle it.
https://sourceforge.net/projects/ezmwlucene/
I welcome all feedback: questions, suggestions and offers to help improve it!
Any interest in helping to roll the new features into the existing Lucene search that's already actively maintained for Wikimedia? This would maximize availability of new functionality and ensure it's used and maintained in the future.
-- brion
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
El 5/7/09 2:49 PM, Robert Stojnic escribió:
Very nice work Chris! I think it strikes a good balance of simplicity and flexibility that makes is ideal for small-to-medium sites.
The architecture itself seems to be similar to that used in early mwsearch, where index is updated via hooks that submit articles directly to indexer. So, it assumes single-host architecture and uses out-of-the-box lucene scoring and highlighting as far as i can see. I think the most interesting part for us is the handling of attachments. I see you use apache poi and pdfbox. We should really try to use this as well, this shouldn't be too hard to do, but needs a bit of fiddling..
Spiffy! :)
Bonus points for code sharing! ;)
-- brion
mediawiki-l@lists.wikimedia.org