2010/11/25 bawolff bawolff+wn@gmail.com:
Personally I think it would be nicer if you could associate source files with the final files.
Yeah, this was discussed a bit earlier in this thread. As far as I can tell, that approach adds a fair degree of complexity (requirement of tracking a whole new class of files in association with other files, including versioning, deletion, etc.). It also seems to presume that you'd never want to reference those same files using standard MediaWiki links. It's not clear to me that such a system has clear advantages over using normal wiki-links to source files from appropriate places.
Stepping back a bit, I did a bit more research over the weekend as to the current state of sourcing in Wikimedia Commons, and which file types would be the most important to support.
Generally speaking, there's an existing (albeit limited) practice of adding sources that can be represented as simple plain-text files, such as POV-Ray, Gnuplot, etc. Sometimes these are formatted using the syntax-highlighting extension, sometimes not. This practice could be made more formal by directly requesting that users add source data when they specify that a file has been created using one of these applications (which is often identified using "Created with" templates). But I don't necessarily see that any additional software support is needed for these formats, save perhaps easier downloadability, which could be added to the syntax-highlighting extension.
For binary formats (and perhaps complex XML-based formats), the following stand out as being of high significance:
* .blend as Blender's native export format and COLLADA as an open interchange format * .xcf as Gimp's native format (preserving layers and other meta-information for bitmap images) * .scribus as Scribus' native format (XML, but files can get very large + have dependencies) * .odt, .odp, .od as OpenDocument formats * potentially OpenEXR and some other open interchange formats.
As far as I understand the pure security (as opposed to content) concerns, these fall primarily into these categories:
* client-side execution of unsafe formats using designated applications (embedded macros, references to other malicious content etc.) * exploitation of browser in-line display for purposes of XSS attacks or similar
Let me know if I'm missing a large category. I'm assuming server-side execution is not an issue for Wikimedia given correct server configuration.
Full security for these and other conceivably useful binary formats seems difficult to obtain to me (that is, making sure that nothing bad ever runs on a user's computer if they open a file). The restricted upload (or restricted attachment) approach builds on social trust to complement technical verification methods. We'd still have to invent some additional machinery to implement security warnings before ever exposing such files directly to the user.
Sacrificing easy individual file manageability, I wonder if it wouldn't be most straightforward to write a decent ZIP handler (with directory display, and thumbnailing of included images, for purposes of patrolling), to disallow ZIP files that contain non-whitelisted filetypes, and to use ZIPs as the container for all complex, free-format source uploads. [[File:Bla source.zip]] could then just be referenced as part of the file description pages where relevant. Because some of the aforementioned binary formats are effectively archives, some of this work would likely be necessary anyway.
That said, I'm not wedded to any particular approach. I hope we can identify reasonably simple steps that we can take to significantly expand our support for source files in the near term, because such files are essential for re-use.