I've been looking at the Lucene search extensions (MWSearch front end, Lucene-search back end) to Mediawiki, and I'm trying to make some sense of everything. Our primary rationale is to be able to index attachments (Office documents and PDFs), but we'd also like the ability to add index information for stuff outside of Mediawiki later. Can anyone confirm or deny the following:
Lucene-search is based on a periodic refresh of the indices (from a full wiki dump). While there is the ability (when combined with the OSI extension) to add/update index information, it again appears to be time-based (since the IncrementalUpdater acts as a daemon, waking every 10 minutes by default). Also, searches remain stale even past that time due to the fact that the snapshots that the searchers use are not refreshed (except by cron entries). Is this correct, and is there any way to have a real-time updated index? With the OAI approach, there are additional tables needed to store article ranking data. Yet you still have to actually run another process (again, via cron) to actually update the ranks. Why is this? It would appear that while Lucene has indexers for those document types, there is nothing in Lucene-search to utilize them Are there better current options for what we're looking for? Lucene is still marked experimental, Joda is marked as a demo, Sphinx doesn't appear to do documents, and Extension:FileIndexer is pretty much a complete hack.
Any thoughts or advice would be greatly appreciated!
Christopher M. Reigrut Applications Systems Architect Key Technology Services / KeyBank 1000 South McCaslin Boulevard Superior, Colorado 80027 720-304-1049
Email Classification: KeyCorp Internal
******************************************************************************* This communication may contain privileged and/or confidential information. It is intended solely for the use of the addressee. If you are not the intended recipient, you are strictly prohibited from disclosing, copying, distributing or using any of this information. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. This communication may contain nonpublic personal information about consumers subject to the restrictions of the Gramm-Leach-Bliley Act. You may not directly or indirectly reuse or redisclose such information for any purpose other than to provide the services for which you are receiving the information.
127 Public Square, Cleveland, OH 44114 *******************************************************************************
If you prefer not to receive future e-mail offers for products or services from Key send an e-mail to mailto:DNERequests@key.com with 'No Promotional E-mails' in the SUBJECT line.
mediawiki-l@lists.wikimedia.org