I've been looking at the Lucene search extensions (MWSearch front end,
Lucene-search back end) to Mediawiki, and I'm trying to make some sense of
everything. Our primary rationale is to be able to index attachments
(Office documents and PDFs), but we'd also like the ability to add index
information for stuff outside of Mediawiki later. Can anyone confirm or
deny the following:
Lucene-search is based on a periodic refresh of the indices (from a full
wiki dump). While there is the ability (when combined with the OSI
extension) to add/update index information, it again appears to be
time-based (since the IncrementalUpdater acts as a daemon, waking every 10
minutes by default). Also, searches remain stale even past that time due
to the fact that the snapshots that the searchers use are not refreshed
(except by cron entries). Is this correct, and is there any way to have a
real-time updated index?
With the OAI approach, there are additional tables needed to store
article ranking data. Yet you still have to actually run another process
(again, via cron) to actually update the ranks. Why is this?
It would appear that while Lucene has indexers for those document types,
there is nothing in Lucene-search to utilize them
Are there better current options for what we're looking for? Lucene is
still marked experimental, Joda is marked as a demo, Sphinx doesn't appear
to do documents, and Extension:FileIndexer is pretty much a complete hack.
Any thoughts or advice would be greatly appreciated!
Christopher M. Reigrut
Applications Systems Architect
Key Technology Services / KeyBank
1000 South McCaslin Boulevard
Superior, Colorado 80027
720-304-1049
Email Classification: KeyCorp Internal
*******************************************************************************
This communication may contain privileged and/or confidential information. It
is intended solely for the use of the addressee. If you are not the intended
recipient, you are strictly prohibited from disclosing, copying, distributing
or using any of this information. If you received this communication in error,
please contact the sender immediately and destroy the material in its entirety,
whether electronic or hard copy. This communication may contain nonpublic personal
information about consumers subject to the restrictions of the
Gramm-Leach-Bliley Act. You may not directly or indirectly reuse or redisclose
such information for any purpose other than to provide the services for which
you are receiving the information.
127 Public Square, Cleveland, OH 44114
*******************************************************************************
If you prefer not to receive future e-mail offers for products or services from Key
send an e-mail to mailto:DNERequests@key.com with 'No Promotional E-mails' in the
SUBJECT line.
Show replies by thread