Re: [Mediawiki-l] Storing or Linking Documents

10 Apr 2007


      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Moin,
On Monday 09 April 2007 20:59:36 Joshua Yeidel wrote:
...
The approach suggested by Ian Smith below is the one adopted by a couple
of systems I work with.  For example, Microsoft SharePoint 2007 uses
"iFilters" for each document type to extract indexable information.  Mac
OS X's "Spotlight" feature also has per-filetype "importers" for
extracting indexable text.
So the concept is not unworkable, but it seems to me to be a stretch for
MediaWiki.  MW is a Wiki-Page Management System™ <grin>, at which it
excels; it's not a very good Document Management System, which is where
Dave Sigafoos is apparently being driven (perhaps in slow stages) by his
users.
Perhaps Dave should investigate other document management approaches and
a metasearch engine to search across multiple systems.
Well, or write an extension that implements his idea, e.g.:
* upon upload, run the document through an index-generator (per file type)
    * add that index-text to some searchable index, or store it in the mysql
for each file type, you can do something like:
pdftotext $pdf_file
    exif $image_file
    etc.
There are very very probably filters for doc, ppt, xls, etc. If not, one can 
always whip one up with a Perl module (I know there exist modules for 
office and excel) While these might not get all the formatting etc, they 
will be able to extract the bulk (if not all) of the text and you can then 
easily index & search this text.
Another option would be to just let the webserver handle this, by running 
htdig (or google appliance?) over the uploaded files (which end up in 
wiki/images, anyway) and present the user with a search box to search all 
these files. The second option wouldn't integrate with mediawiki that 
nicely, tho.
All the best,
Tels
- -- 
 Signed on Mon Apr  9 23:34:55 2007 with key 0x93B84C15.
 Get one of my photo posters: http://bloodgate.com/posters
 PGP key on http://bloodgate.com/tels.asc or per email.
"One man in a thousand is a leader of men, the other 999 follow women"
-- Groucho Marx
...PGP SIGNATURE...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRhrO1HcLPEOTuEwVAQK1QQf7BT92eQPIzdE8JROOthZoLpFLdEfrzlpe
FO2rBOy6cN/D0eDN5ygCdsIgS7njrCyrxpsOqBA9osH63e2TJ4f7SZgFtV7z8EqP
S7ioCHiqa15Xe2P9fIZrq6wAh8DvuuBhkUMcx1J8310tDRjLAqY3RXoFv+bwYHE/
714eJAIJYqTFXP12QkWGvBmXg8M6fZ2apAeckM0+bpvXbCpmtqIRTfhvWttn9cOZ
gRYQb8iOOgpf66DouXUSkGX8dTU6vA/ws3R67M6PX63q6bkRd8sKJMehLNt1hcGO
r3OqMKVfAy1Y3BYaA0Ab/zhqNDM6EgqziefGWgZPSEoDEJj9OJP3sQ==
=RX1m
-----END PGP SIGNATURE-----

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [Mediawiki-l] Storing or Linking Documents