[Mediawiki-l] Storing or Linking Documents

Michael Daly mikedaly at magma.ca
Mon Apr 9 17:26:33 UTC 2007


Dave Sigafoos wrote:

> Unless MS removed every
> 'text' word from their document I don't see where an extension that
> could index the words would be a moving target.

I have a lot of old documents in IBM's Bookmanager format.  They are 
encrypted in such a way that no one can scan past the formating info and 
find the text.  I rely on some old, buggy Bookmanager software to access 
them and expect that with another change of OS version I will lose the 
ability to use them.  IBM has never released the internal format 
specification for the documents and nothing I've been able to do has 
wrested the info from them.

I keep expecting MS to pull a similar stunt with Word.  They could sell 
the encryption as a "security" feature.

> Do you think that MS gives a crap whether MW is or is not capable of
> indexing word documents?  Unless, of course you feel able to calling
> Bill and having him change his format.

MS might give a crap about the laws currently being pushed out that 
force open standards for document storage.  These governments don't want 
to have docs stored that become obsolete because of one vendor's 
decision to change their format.  This was what I was thinking of when I 
mentioned saving in an open standard.  Of course, the battle right now 
is MS's version of it's "open standard" versus the open source 
community's desire for a truly open standard.

It's been in the computer news so much lately I thought you'd get the 
drift of the comments.  I guess I shouldn't have been so obscure. Sorry!

> I believe that the market place will decide, rightly or wrongly (and who
> decides that?), what tools and "standards" will be used.

It looks like the elected reps will beat the market to it.  That of 
course brings its own risks/rewards.


> Of course right now it doesn't really matter as the only TYPE is IMAGE
> and MW doesn't appear to be able to search on it (which makes sense if
> the only type I wanted to store was IMAGE.

Searching on images is a major problem.  Do you search only on names of 
images, on descriptions or on content?  Searching content is still a 
significant research effort in image processing and recognition.

The only restriction on what you can upload is in the extension list in 
Localsettings.php.  My wiki allows specific text uploads.  There's an 
extension I'm working on that processes them (mod of an existing 
extension).  I don't see any reason why you couldn't make an extension 
that does the same with any chosen doc format.

Mike



More information about the MediaWiki-l mailing list