My thought was that if we have the ability to add 'types' we could then define extensions to work with that 'type'.
There are a couple formats that seem to be 'universal' whether we like it or not. Word, excel, ppt, pdf then emerging standards like those coming from open office.
Also code documents, php, html, c and xml etc could be stored.
And it isn't so much " .. The problem is writing decoders for every document format in the world .." as a couple of the standards. For example, in the environment that I work in I have written several api examples to connect to a different database. Once there are a couple good examples are there then some will be able to duplicate the process.
Also I am not sure how many 'decoders' would be needed. For example a word document should be able to be searched without decoding it to plain text. Yes? Maybe not.
" .. couldn't we add that plaintext to the text indexed for the Image: page ..". This would work, but wouldn't it make more sense to have definitions of the 'document type'.
I realize that this is more than wiki was intended but MW is such an incredible 'product' that I can see people using it more and more for their business use.
Of course not all tools should be used for all situations. It just *seems* to me that document/documentation search/retrieval is a close fit.
Thanks for the follow up
DSig David Tod Sigafoos | SANMAR Corporation PICK Guy 206-770-5585 davesigafoos@sanmar.com
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Ian Smith Sent: Sunday, April 08, 2007 8:30 To: MediaWiki announcements and site admin list; MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Storing or Linking Documents
Identifying the type isn't the problem - that's easy. The problem is writing decoders for every document format in the world, and hacking them into the existing MySQL-based search system.
Having said that, if we had to-plaintext converters for key doc formats, This could happen at save, and Wiki admins could configure converters by doc suffix.
Of course, the true answer is still to browbeat our users into using wiki markup... ;-)
Ian
-----Original Message----- From: Dave Sigafoos [mailto:davesigafoos@sanmar.com] Sent: Saturday, April 07, 2007 06:45 PM Pacific Standard Time To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Storing or Linking Documents
So how hard would it be to expand the upload process to allow selecting the 'type' of upload? Then the 'type' would be able to be searched thus adding a good benefit to MW.
Also, wouldn't it make sense, since the upload process has a 'comment' that you can enter, that a search against this comment be allowed. I do understand that searching on binary of an image really makes no sense (unless you are storing hidden text :) but allowing entry / search of keywords might be a good idea
Thanks.
DSig David Tod Sigafoos | SANMAR Corporation PICK Guy 206-770-5585 davesigafoos@sanmar.com
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Jim Wilson Sent: Friday, April 06, 2007 11:31 To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Storing or Linking Documents
The Image: namespace stores the meta-data for all uploaded files; I guess the "Image" name is based on history and how it's used in WP.
But
for those of us using MW for corporate nets, "Image:" means any
uploaded
file.
AFAIK, the namespace is called "Image" because that's what it's meant to store - images. Not video, not Excel spreadsheets, not Word docs.
Using the Image upload facility for something other than pure images represents an intentional circumvention of the spirit of the device (regardless of business needs - which I understand).
For the record, we have a wiki here where I work, and yes, people upload Excel spreadsheets and word docs and PDFs and ZIP files and .... etc.
-- Jim
On 4/6/07, Ian Smith ismith@good.com wrote:
Dave Sigafoos:
I had gathered that images weren't searchable which makes sense to
me
(except for descriptive information) but I did not realize that a document with 'text' would not be searchable.
Documents are simply stored as-is in the filesystem; so, an uploaded Word doc ends up stored in c:\WebServer\mediawiki\images\f\f7\foo.doc. In contrast, Wiki pages are stored as fields in the MySQL database.
Search doesn't work on uploaded documents, because:
- the search uses the MySQL search facility, and so only works on
stuff
which is in the DB 2. since an uploaded doc could be in any format, there's no way to search it: eg. if a document compresses its content using some proprietary scheme, there's no general way to look inside it.
Note that the problems go beyond search: features like "What links
here"
only work for links from Wiki pages, etc.
I do see now that it seems to put all uploaded 'media' to IMAGE:
which
I
am not sure I understand.
The Image: namespace stores the meta-data for all uploaded files; I guess the "Image" name is based on history and how it's used in WP.
But
for those of us using MW for corporate nets, "Image:" means any
uploaded
file.
Believe me, I feel your pain... if you find a way to stop your users using Word for a single sentence of plain text, let me know. ;-)
Ian
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l