-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Moin,
On Monday 09 April 2007 20:59:36 Joshua Yeidel wrote:
The approach suggested by Ian Smith below is the one
adopted by a couple
of systems I work with. For example, Microsoft SharePoint 2007 uses
"iFilters" for each document type to extract indexable information. Mac
OS X's "Spotlight" feature also has per-filetype "importers" for
extracting indexable text.
So the concept is not unworkable, but it seems to me to be a stretch for
MediaWiki. MW is a Wiki-Page Management Systemâ„¢ <grin>, at which it
excels; it's not a very good Document Management System, which is where
Dave Sigafoos is apparently being driven (perhaps in slow stages) by his
users.
Perhaps Dave should investigate other document management approaches and
a metasearch engine to search across multiple systems.
Well, or write an extension that implements his idea, e.g.:
* upon upload, run the document through an index-generator (per file type)
* add that index-text to some searchable index, or store it in the mysql
for each file type, you can do something like:
pdftotext $pdf_file
exif $image_file
etc.
There are very very probably filters for doc, ppt, xls, etc. If not, one can
always whip one up with a Perl module (I know there exist modules for
office and excel) While these might not get all the formatting etc, they
will be able to extract the bulk (if not all) of the text and you can then
easily index & search this text.
Another option would be to just let the webserver handle this, by running
htdig (or google appliance?) over the uploaded files (which end up in
wiki/images, anyway) and present the user with a search box to search all
these files. The second option wouldn't integrate with mediawiki that
nicely, tho.
All the best,
Tels
- --
Signed on Mon Apr 9 23:34:55 2007 with key 0x93B84C15.
Get one of my photo posters:
http://bloodgate.com/posters
PGP key on
http://bloodgate.com/tels.asc or per email.
"One man in a thousand is a leader of men, the other 999 follow women"
-- Groucho Marx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
iQEVAwUBRhrO1HcLPEOTuEwVAQK1QQf7BT92eQPIzdE8JROOthZoLpFLdEfrzlpe
FO2rBOy6cN/D0eDN5ygCdsIgS7njrCyrxpsOqBA9osH63e2TJ4f7SZgFtV7z8EqP
S7ioCHiqa15Xe2P9fIZrq6wAh8DvuuBhkUMcx1J8310tDRjLAqY3RXoFv+bwYHE/
714eJAIJYqTFXP12QkWGvBmXg8M6fZ2apAeckM0+bpvXbCpmtqIRTfhvWttn9cOZ
gRYQb8iOOgpf66DouXUSkGX8dTU6vA/ws3R67M6PX63q6bkRd8sKJMehLNt1hcGO
r3OqMKVfAy1Y3BYaA0Ab/zhqNDM6EgqziefGWgZPSEoDEJj9OJP3sQ==
=RX1m
-----END PGP SIGNATURE-----