Hello all,
for some types of resources, it's desirable to upload source files (whether it's Blender, COLLADA, Scribus, EDL, or some other format), so that others can more easily remix and process them. Currently, as far as I know, there's no way to upload these resources to Commons.
What would be the arguments against allowing administrators to upload arbitrary ZIP files on Wikimedia Commons, allowing the Commons community to develop policy and process around when such archived resources are appropriate? An alternative, of course, would be to whitelist every possible source format for admins, but it seems to me that it would be a good general policy to not enable additional support for formats that aren't officially supported (reduces confusion among users about what's permitted -- there's only one file format they can't use).
Thoughts?
Thanks, Erik
On 25.10.2010, 23:02 Erik wrote:
Hello all,
for some types of resources, it's desirable to upload source files (whether it's Blender, COLLADA, Scribus, EDL, or some other format), so that others can more easily remix and process them. Currently, as far as I know, there's no way to upload these resources to Commons.
What would be the arguments against allowing administrators to upload arbitrary ZIP files on Wikimedia Commons, allowing the Commons community to develop policy and process around when such archived resources are appropriate? An alternative, of course, would be to whitelist every possible source format for admins, but it seems to me that it would be a good general policy to not enable additional support for formats that aren't officially supported (reduces confusion among users about what's permitted -- there's only one file format they can't use).
Thoughts?
Instead of amassing social constructs around technical deficiency, I propose to fix bug 24230 [1] by implementing proper checking for JAR format. Also, we need to check all contents with antivirus and disallow certain types of files inside archives (such as .exe). Once we took all these precautions, I see no need to restrict ZIPs to any special group. Of course, this doesn't mean that we soul allow all the safe ZIPs, just several open ZIP-based file formats.
------------- [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=24230
On Mon, Oct 25, 2010 at 3:50 PM, Max Semenik maxsem.wiki@gmail.com wrote:
Instead of amassing social constructs around technical deficiency, I propose to fix bug 24230 [1] by implementing proper checking for JAR format.
Does that bug even affect Wikimedia? We have uploads segregated on their own domain, where we don't set cookies or do anything else interesting, so what would an uploaded JAR file even do? If that kind of attack is still a problem even with separate domains, we can do like Mozilla's Bugzilla and serve each uploaded file from its own unique domain (that would have ramifications for how browsers fetch the images, but they might be positive anyway).
Aryeh Gregor wrote:
On Mon, Oct 25, 2010 at 3:50 PM, Max Semenik maxsem.wiki@gmail.com wrote:
Instead of amassing social constructs around technical deficiency, I propose to fix bug 24230 [1] by implementing proper checking for JAR format.
Does that bug even affect Wikimedia? We have uploads segregated on their own domain, where we don't set cookies or do anything else interesting, so what would an uploaded JAR file even do? If that kind of attack is still a problem even with separate domains, we can do like Mozilla's Bugzilla and serve each uploaded file from its own unique domain (that would have ramifications for how browsers fetch the images, but they might be positive anyway).
Well, the fact that a would not be able to steal the cookies if they could place a jar file there* doesn't mean a malicious applet there isn't bad.
*Not sure if we can really assert that. Most likely it varies depending on browser, JVM and version.
Doing a full ZIP exploration against java classes is simple. However, we should check that everything there is clean, not that nothing there is blacklisted.
Archive formats have its own can of of issues. We don't want people to upload a "OASIS file" that contains a videogame, even if it's not a jar or a virus. How to determine if a file should be in the archive or not? What to do with archived archives?
On Mon, Oct 25, 2010 at 10:09 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
On Mon, Oct 25, 2010 at 3:50 PM, Max Semenik maxsem.wiki@gmail.com wrote:
Instead of amassing social constructs around technical deficiency, I propose to fix bug 24230 [1] by implementing proper checking for JAR format.
Does that bug even affect Wikimedia? We have uploads segregated on their own domain, where we don't set cookies or do anything else interesting, so what would an uploaded JAR file even do?
upload.wikimedia.org could end up on Google's Safe Surfing (or however it's called) blacklist for hosting malicious .jar's which are injected on another pwned web site or loaded through pwned advertising brokers. Given the fact that Java is the 2nd biggest exploit vector in terms of exploits (but 1st in terms of impact - users don't update Java as often as the Adobe Reader), it should not be allowed to upload JARs (or things that look like something else, but infact can be loaded and executed by the JRT) to Wikipedia.
Marco
On Mon, Oct 25, 2010 at 10:51 PM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
On Mon, Oct 25, 2010 at 10:09 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
On Mon, Oct 25, 2010 at 3:50 PM, Max Semenik maxsem.wiki@gmail.com wrote:
Instead of amassing social constructs around technical deficiency, I propose to fix bug 24230 [1] by implementing proper checking for JAR format.
Does that bug even affect Wikimedia? We have uploads segregated on their own domain, where we don't set cookies or do anything else interesting, so what would an uploaded JAR file even do?
upload.wikimedia.org could end up on Google's Safe Surfing (or however it's called) blacklist for hosting malicious .jar's which are injected on another pwned web site or loaded through pwned advertising brokers. Given the fact that Java is the 2nd biggest exploit vector in terms of exploits (but 1st in terms of impact - users don't update Java as often as the Adobe Reader), it should not be allowed to upload JARs (or things that look like something else, but infact can be loaded and executed by the JRT) to Wikipedia.
Marco
VMSoft GbR Nabburger Str. 15 81737 München Geschäftsführer: Marco Schuster, Volker Hemmert http://vmsoft-gbr.de
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Should we also be exploring any possibly malicious archives inside archives recursively, or is just making sure the archive itself is good is good enough?
Martijn Hoekstra wrote:
Should we also be exploring any possibly malicious archives inside archives recursively, or is just making sure the archive itself is good is good enough?
I think that we should block such files. Also note that we can't recursively analyse everything since that would allow to DoS us.
On Tue, Oct 26, 2010 at 6:50 AM, Max Semenik maxsem.wiki@gmail.com wrote:
Instead of amassing social constructs around technical deficiency, I propose to fix bug 24230 [1] by implementing proper checking for JAR format. Also, we need to check all contents with antivirus and disallow certain types of files inside archives (such as .exe). Once we took all these precautions, I see no need to restrict ZIPs to any special group. Of course, this doesn't mean that we soul allow all the safe ZIPs, just several open ZIP-based file formats.
If we only want zip's for several formats, we should check that they are of the expected type, _and_ that they consist of open file formats within the zip.
e.g. Open Office XML (the MS format) can include binary files for OLE objects and fonts (I think)
see "Table 2. Content types in a ZIP container"
http://msdn.microsoft.com/en-us/library/aa338205(office.12).aspx
OOXML can also include any other mimetype, which are registered _within_ the zip, and linked into the main content file.
afaics, allowing only safe zip to be upload isn't difficult.
Expand the zip, and reject any zip which contains files on $wgFileBlacklist, and not on $wgFileExtensions + $wgZipFileExtensions.
$wgZipFileExtensions would consist of array('xml')
Then check the mimetypes of the files in the zip, against $wgMimeTypeBlacklist (with 'application/zip' removed), again allowing desired XML mimetypes through.
-- John Vandenberg
On 10/25/2010 12:02 PM, Erik Moeller wrote:
Hello all,
for some types of resources, it's desirable to upload source files (whether it's Blender, COLLADA, Scribus, EDL, or some other format), so that others can more easily remix and process them. Currently, as far as I know, there's no way to upload these resources to Commons.
What would be the arguments against allowing administrators to upload arbitrary ZIP files on Wikimedia Commons, allowing the Commons community to develop policy and process around when such archived resources are appropriate? An alternative, of course, would be to whitelist every possible source format for admins, but it seems to me that it would be a good general policy to not enable additional support for formats that aren't officially supported (reduces confusion among users about what's permitted -- there's only one file format they can't use).
Thoughts?
Thanks, Erik
Its most ideal if we actually support these formats, so we can do thing like thumbnails, basic meta data etc. Failing that its better to support a given file extension, then it is to support zip files. This way if in 'the future' we add support for X file format, then we have X format files stored consistently so we can support representation of that file format.
If we add blanket support for 'throw whatever you want' into a zip file, it will be difficult to give a quality representation of that asset in the future. ( other than as a zip file with multiple sub assets ).
If for example someone writes a diff engine for representing 3d model transformations, we won't as easily be able to plug-in that tool, if we don't have a consistent storage model for that file format.
That being said their may be some composite asset sets that lack container systems, in which case it would not be bad support some open container format.
The number of formats or multimedia asset compositing systems that are not web representable with JavaScript engines or natively supported in the browser should be on a dramatic decline in the next decade, so best to just focus on support for such formats.
For example we prefer svg uploads to a zip file with an illustrator assets, because svg is representable in the browser, there are javascript based engines for editing svg [http://svg-edit.googlecode.com/svn/branches/2.4/editor/svg-editor.html] etc. Likewise for 3d model representation with the COLLADA format, (although much more in its infancy at this point in time. )
--michael
On Mon, Oct 25, 2010 at 1:05 PM, Michael Dale mdale@wikimedia.org wrote:
Its most ideal if we actually support these formats, so we can do thing like thumbnails, basic meta data etc. Failing that its better to support a given file extension, then it is to support zip files. This way if in 'the future' we add support for X file format, then we have X format files stored consistently so we can support representation of that file format.
If we add blanket support for 'throw whatever you want' into a zip file, it will be difficult to give a quality representation of that asset in the future. ( other than as a zip file with multiple sub assets ).
I tend to agree that it's preferable to be able to recognize and validate formats; though as noted sometimes you're going to have stuff that doesn't really fit well in an individual file.
Certainly for Wikibooks I could envision *all sorts* of totally legitimate use for being able to upload/download various files, including archives. The Blender handbook could use example files and projects to download, which might include dozens of support files. A programming module might need to provide source code and sample input files.
Then we have the 'media source file' case: an animation should be able to include the Blender or POV-Ray or whatever sources that were used to create it. A pretty picture built in a layered raster system like Gimp or Photoshop would do better to include the source .xcf or .psd than not too, even if the source file is in a format that's harder to work with.
I believe we've got an old bug on the idea of being to explicitly attach a source file: https://bugzilla.wikimedia.org/show_bug.cgi?id=17012
In all cases we have the worry that if we allow uploading those funky formats, we'll either a) end up with malicious files or b) end up with lazy people using and uploading non-free editing formats when we'd prefer them to use freely editable formats. I'm not sure I like the idea of using admin powers to control being able to upload those, though; bottlenecking content reviews as a strict requirement can be problematic on its own.
What I'd probably like to see is a more wide-open allowal of arbitrary 'source files' which can be uploaded as attachments to standalone files. We could give them more limited access: download only, no inline viewing, only allowed if DLs are on separate safe domain, etc.
I don't really relish the thought of checking image source data for warez archives, though. :) Can't guarantee a magic solution there.
-- brion
2010/10/25 Brion Vibber brion@pobox.com:
In all cases we have the worry that if we allow uploading those funky formats, we'll either a) end up with malicious files or b) end up with lazy people using and uploading non-free editing formats when we'd prefer them to use freely editable formats. I'm not sure I like the idea of using admin powers to control being able to upload those, though; bottlenecking content reviews as a strict requirement can be problematic on its own.
Yeah, I don't like the bottleneck approach either, but in the absence of better systems, it may be the best way to go as an immediate solution. We could do it for a list of whitelisted open formats that are requested by the community. And we'd see from usage which file types we need to prioritize proper support/security checks for.
What I'd probably like to see is a more wide-open allowal of arbitrary 'source files' which can be uploaded as attachments to standalone files. We could give them more limited access: download only, no inline viewing, only allowed if DLs are on separate safe domain, etc.
It seems fairly straightforward to me to say: "These free file formats are permitted to be uploaded. We haven't developed fully sophisticated security checks for them yet, so we're asking trusted users to do basic sanity checks until we've developed automatic checks." We can then prod people to convert any proprietary formats into free ones that are on that whitelist. And if they're free formats, I'm not sure why they shouldn't be first-class citizens -- as Michael mentioned, that makes it possible to plop in custom handlers at a later time. A COLLADA handler for 3D files may seem like a remote possibility, but it's certainly within the realm of sanity. ZIP files would have to be specially treated so they're only allowed if they contain only files in permitted formats.
So, consistent with Michael's suggestion, we could define a 'restricted-upload' right, initially given to admins only but possibly expanded to other users, which would allow files from the "potentially insecure" list of extensions to be uploaded, and for ZIP files, would ensure that only accepted file types are contained within the archive. The resultant review bottleneck would simply be a reflection that we haven't gotten around to adding proper support for these file types yet. On the plus side, we could add restricted upload support for new open formats as soon as there's consensus to do so.
The main downside I would see is that users might end up being confused why these files get uploaded. To mitigate this, we could add a "This file has a restricted filetype. Files of this type can currently only be uploaded by administrators for security reasons" note on file description pages.
@2010-10-26 03:45, Erik Moeller:
2010/10/25 Brion Vibberbrion@pobox.com:
In all cases we have the worry that if we allow uploading those funky formats, we'll either a) end up with malicious files or b) end up with lazy people using and uploading non-free editing formats when we'd prefer them to use freely editable formats. I'm not sure I like the idea of using admin powers to control being able to upload those, though; bottlenecking content reviews as a strict requirement can be problematic on its own.
Yeah, I don't like the bottleneck approach either, but in the absence of better systems, it may be the best way to go as an immediate solution. We could do it for a list of whitelisted open formats that are requested by the community. And we'd see from usage which file types we need to prioritize proper support/security checks for.
What I'd probably like to see is a more wide-open allowal of arbitrary 'source files' which can be uploaded as attachments to standalone files. We could give them more limited access: download only, no inline viewing, only allowed if DLs are on separate safe domain, etc.
It seems fairly straightforward to me to say: "These free file formats are permitted to be uploaded. We haven't developed fully sophisticated security checks for them yet, so we're asking trusted users to do basic sanity checks until we've developed automatic checks." We can then prod people to convert any proprietary formats into free ones that are on that whitelist. And if they're free formats, I'm not sure why they shouldn't be first-class citizens -- as Michael mentioned, that makes it possible to plop in custom handlers at a later time. A COLLADA handler for 3D files may seem like a remote possibility, but it's certainly within the realm of sanity. ZIP files would have to be specially treated so they're only allowed if they contain only files in permitted formats.
So, consistent with Michael's suggestion, we could define a 'restricted-upload' right, initially given to admins only but possibly expanded to other users, which would allow files from the "potentially insecure" list of extensions to be uploaded, and for ZIP files, would ensure that only accepted file types are contained within the archive. The resultant review bottleneck would simply be a reflection that we haven't gotten around to adding proper support for these file types yet. On the plus side, we could add restricted upload support for new open formats as soon as there's consensus to do so.
The main downside I would see is that users might end up being confused why these files get uploaded. To mitigate this, we could add a "This file has a restricted filetype. Files of this type can currently only be uploaded by administrators for security reasons" note on file description pages.
ODS, ODT and such should be fairly easy to check at least on a basic level. A very basic check would be to check if it contains "Basic" or "Scripts" folder. Bit more advanced would be to check if manifest.xml contains "application/binary" (to check if anyone tried to change default naming) and check if any file contains "<script:module" (for the same reason). If any of this would be true than there should be a warning.
I think we should also support Dia for diagrams and XCF for layered bitmaps. Don't know much about XCF, but Dia is a simple XML file (which might be zipped) and so shouldn't be dangerous at all. I guess it could even be unzipped upon loading because Dia supports both zipped and unzipped versions alike. There is/was also Extension:Dia which generates thumbnails... It seems to work fine even with 1.16 from the trunk and the latest Dia version. It doesn't work with zipped Dia files but this would be manageable.
Regards, Nux.
[Kicking this thread back to life, full-quoting below only for quick reference.]
I've collected some additional notes on this here: http://commons.wikimedia.org/wiki/Commons:Restricted_uploads
Would appreciate feedback & will circulate further in the Commons community.
Thanks, Erik
2010/10/25 Erik Moeller erik@wikimedia.org:
2010/10/25 Brion Vibber brion@pobox.com:
In all cases we have the worry that if we allow uploading those funky formats, we'll either a) end up with malicious files or b) end up with lazy people using and uploading non-free editing formats when we'd prefer them to use freely editable formats. I'm not sure I like the idea of using admin powers to control being able to upload those, though; bottlenecking content reviews as a strict requirement can be problematic on its own.
Yeah, I don't like the bottleneck approach either, but in the absence of better systems, it may be the best way to go as an immediate solution. We could do it for a list of whitelisted open formats that are requested by the community. And we'd see from usage which file types we need to prioritize proper support/security checks for.
What I'd probably like to see is a more wide-open allowal of arbitrary 'source files' which can be uploaded as attachments to standalone files. We could give them more limited access: download only, no inline viewing, only allowed if DLs are on separate safe domain, etc.
It seems fairly straightforward to me to say: "These free file formats are permitted to be uploaded. We haven't developed fully sophisticated security checks for them yet, so we're asking trusted users to do basic sanity checks until we've developed automatic checks." We can then prod people to convert any proprietary formats into free ones that are on that whitelist. And if they're free formats, I'm not sure why they shouldn't be first-class citizens -- as Michael mentioned, that makes it possible to plop in custom handlers at a later time. A COLLADA handler for 3D files may seem like a remote possibility, but it's certainly within the realm of sanity. ZIP files would have to be specially treated so they're only allowed if they contain only files in permitted formats.
So, consistent with Michael's suggestion, we could define a 'restricted-upload' right, initially given to admins only but possibly expanded to other users, which would allow files from the "potentially insecure" list of extensions to be uploaded, and for ZIP files, would ensure that only accepted file types are contained within the archive. The resultant review bottleneck would simply be a reflection that we haven't gotten around to adding proper support for these file types yet. On the plus side, we could add restricted upload support for new open formats as soon as there's consensus to do so.
The main downside I would see is that users might end up being confused why these files get uploaded. To mitigate this, we could add a "This file has a restricted filetype. Files of this type can currently only be uploaded by administrators for security reasons" note on file description pages. -- Erik Möller Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
Erik Moeller wrote:
I've collected some additional notes on this here: http://commons.wikimedia.org/wiki/Commons:Restricted_uploads
Would appreciate feedback & will circulate further in the Commons community.
From a social and technical perspective, this proposal is horribly hackish.
The over-arching goal should be to implement fewer hacks, though we obviously don't live in an ideal world.
Given the current parameters, this is probably the best solution. However, there needs to be a more in-depth analysis of the potential security implications of some of these file types. Even trusted users shouldn't be able to upload files that allow for the arbitrary injection of PHP, for example. I suppose that's why you're asking for more feedback from wikitech-l.
The current proposal is vague about which specific file types are desired. A concrete list ought to be generated so that people can research the known security implications of allowing those file types to uploaded.
I don't think there is ever going to be (or ever should be) a generic whitelist to allow any and all free/open file types. What are the specific file types that are currently banned that you're seeking to have partially unbanned?
MZMcBride
On Thu, Nov 25, 2010 at 12:46 AM, Erik Moeller erik@wikimedia.org wrote:
[Kicking this thread back to life, full-quoting below only for quick reference.]
I've collected some additional notes on this here: http://commons.wikimedia.org/wiki/Commons:Restricted_uploads
Would appreciate feedback & will circulate further in the Commons community.
I think you are taking the wrong approach here, altough I agree with MZMcBride's reply to your mail "From a social and technical perspective, this proposal is horribly hackish. [...] Given the current parameters, this is probably the best solution. [...]"
I believe that we should really be aiming for scanning for security vulnerabilities and reject only those files that pose a vulnerability. For example, we do now outright reject open office files, as they may encapsulate files that will be executed by the JVM. We should be able to determine the exact circumstances that pose a vulnerability and only reject those files, similar to what we have done for the embedded HTML in files that affects IE.
Bryan
On 25 November 2010 07:58, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
I think you are taking the wrong approach here, altough I agree with MZMcBride's reply to your mail "From a social and technical perspective, this proposal is horribly hackish. [...] Given the current parameters, this is probably the best solution. [...]"
The rock and hard place here are:
1. This solution is horribly hacky and bletcherous. 2. The ideal is the enemy of the actually adequate; at present things are not adequate.
Do we have a clear picture of what the ideal looks like? Are the hacks clearly on the path to that and not to obstruct it in any way?
- d.
Erik Moeller wrote:
[Kicking this thread back to life, full-quoting below only for quick reference.]
I've collected some additional notes on this here: http://commons.wikimedia.org/wiki/Commons:Restricted_uploads
Would appreciate feedback & will circulate further in the Commons community.
Thanks, Erik
How do you expect the end users to send ? Uploading to a service like megaupload? As email attachments? Via OTRS? Using a toolserver app?
Seems a use case for the upload stash. Allow the users to upload the file, but require approval until it is finally publicly shown. We could even show the files publically, as far as there's no direct download, requiring downloaders to provide a session token in the process.
In any case, files treated as html by IE would still need to be disallowed.
wikitech-l@lists.wikimedia.org