[Mediawiki-l] Lucene

Christopher.Reigrut at Key.com Christopher.Reigrut at Key.com
Mon Nov 24 20:02:25 UTC 2008


I've been looking at the Lucene search extensions (MWSearch front end, 
Lucene-search back end) to Mediawiki, and I'm trying to make some sense of 
everything.  Our primary rationale is to be able to index attachments 
(Office documents and PDFs), but we'd also like the ability to add index 
information for stuff outside of Mediawiki later.  Can anyone confirm or 
deny the following:

Lucene-search is based on a periodic refresh of the indices (from a full 
wiki dump).  While there is the ability (when combined with the OSI 
extension) to add/update index information, it again appears to be 
time-based (since the IncrementalUpdater acts as a daemon, waking every 10 
minutes by default).  Also, searches remain stale even past that time due 
to the fact that the snapshots that the searchers use are not refreshed 
(except by cron entries).  Is this correct, and is there any way to have a 
real-time updated index?
With the OAI approach, there are additional  tables needed to store 
article ranking data.  Yet you still have to actually run another process 
(again, via cron) to actually update the ranks.  Why is this?
It would appear that while Lucene has indexers for those document types, 
there is nothing in Lucene-search to utilize them
Are there better current options for what we're looking for?  Lucene is 
still marked experimental, Joda is marked as a demo, Sphinx doesn't appear 
to do documents, and Extension:FileIndexer is pretty much a complete hack.

Any thoughts or advice would be greatly appreciated!

Christopher M. Reigrut 
Applications Systems Architect 
Key Technology Services / KeyBank 
1000 South McCaslin Boulevard 
Superior, Colorado 80027 
720-304-1049 

Email Classification: KeyCorp Internal



*******************************************************************************
This communication may contain privileged and/or confidential information. It
is intended solely for the use of the addressee. If you are not the intended
recipient, you are strictly prohibited from disclosing, copying, distributing
or using any of this information. If you received this communication in error,
please contact the sender immediately and destroy the material in its entirety,
whether electronic or hard copy. This communication may contain nonpublic personal
information about consumers subject to the restrictions of the 
Gramm-Leach-Bliley Act. You may not directly or indirectly reuse or redisclose
such information for any purpose other than to provide the services for which
you are receiving the information.

127 Public Square, Cleveland, OH 44114
*******************************************************************************


If you prefer not to receive future e-mail offers for products or services from Key 
send an e-mail to mailto:DNERequests at key.com with 'No Promotional E-mails' in the 
SUBJECT line.


More information about the MediaWiki-l mailing list