Hi, a recent discussion in https://bugzilla.wikimedia.org/show_bug.cgi?id=65724#c3 revealed that parts of the SVG standard are deliberately broken on commons. While I see some reasons to not adhere fully to the standard, e.g. external resources might break over time, if they are moved or deleted, I don't feel it's good to break the standard as hard as it's done right now. It puts a burden on creators, on the principle of sharing within the wikimedia environment and overall, it's even technically inferior and leads or might lead to useless duplication of content. The SVG standard defines an image element. The image resource is linked to using the xlink:href attribute. Optionally the image is embedded into the SVG using the https://en.wikipedia.org/wiki/Data_URI_scheme%5Bhttps://en.wikipedia.org/wik...]. Combining SVGs with traditional bitmap images is useful in several ways: It allows creators sharing the way an image is manipulated and eases future modification that are hard to do or even impossible using traditional bitmap/photo editing. It basically has the same advantages that mash-up web content has over static content: Each layer or element can be modified individually without destroying the other elements. It's easy to see that a proper SVG is more to its potential users than a classig JPG or PNG with only one layer, being the result of all image operations. These reasons point out the necessity for barrier-free access to the image element. Currently, commons cripples this access layed out in the standard and originally implemented by "librsvg". It disables the handling of HTTP(S) resources. Users needing the same bitmap in more than one SVG are forces to base64-embed their source, and hence duplicate it, in each individual SVG. Indeed, there is quite some burden on creators and on wikimedia servers that duplicate lots of data right now and potentially even more in the future. Note that this duplication of data goes unnoticed by the routines specifically in place for bitmaps right now, that check uploads on MD5 collision and reject the upload on dup detection. Space might be cheap as long as donations are flowing, but reverting bad practice once it is common is harder than promoting good practice /now/ by adhering to the standard as closely as possible. Therefore I advocate change to librsvg in one of the two ways layed out in comment 3 of the bug report given above and (re)support linking to external bitmaps in SVGs. Two strategies that come to mind to prevent disappearance of an external resource in the web are: 1) cache external refs on thumbnail generation, check for updates on external server on thumbnail re-generation
2) allow external refs to images residing on wikimedia servers only
Point 2) should be considered the easiest implementation, 1) is harder to implement but gives even more freedom to SVG creators and would adhere more closely to SVG standard. However, another argument for 2) would be the licensing issue: It ensures that only images are linked to that have been properly licensed by commons users and the upload process (and if a license violation is detected and the linked-to bitmap removed from commons, the SVG using such a bitmap breaks gracefully). Regards, Christian
On 05/27/2014 12:37 PM, "Christian Müller" wrote:
Point 2) should be considered the easiest implementation, 1) is harder to implement but gives even more freedom to SVG creators and would adhere more closely to SVG standard. However, another argument for 2) would be the licensing issue: It ensures that only images are linked to that have been properly licensed by commons users and the upload process (and if a license violation is detected and the linked-to bitmap removed from commons, the SVG using such a bitmap breaks gracefully).
The problem with either is that, short of installing a very complicated and brittle full URL parser in the SVG validation code, you open the door to a number of very nearly insurmountable (and highly catastrophic) security issues, the most important of which is that you then allow anyone able to upload an image the capability to force either the client or (worse) the image scalers to perform an arbitrary GET on the projects -- including such things as API calls simply by viewing or processing an image.
Even stringent validation is brittle and opens a number of hard to track security vulnerabilities.
-- Marc
On 05/27/2014 03:08 PM, Marc A. Pelletier wrote:
The problem with either is that, short of installing a very complicated and brittle full URL parser in the SVG validation code, you open the door to a number of very nearly insurmountable (and highly catastrophic) security issues, the most important of which is that you then allow anyone able to upload an image the capability to force either the client or (worse) the image scalers to perform an arbitrary GET on the projects -- including such things as API calls simply by viewing or processing an image.
First, I think this is avoidable, if we limit it to upload.wikimedia.org and bits.wikimedia.org (see below).
But hypothetically, say the API is accessible on one of those hosts. Can't this be addressed with a simple wall clock timeout? If it takes more than X seconds to fetch the Wikimedia HTTP(S) resource, fail the thumbnail generation.
With regard to other issues (e.g. API calls *doing* something), correct me if I'm wrong, but I don't see a security issue. Our API is designed such that it's impossible to either login or take any write action using a GET request. (If there were a bug in this, it would other serious ramifications, and would need to be fixed in the API proper).
Even stringent validation is brittle and opens a number of hard to track security vulnerabilities.
It's true that blacklists are doomed, and that complex whitelists are hard to read and maintain. But a whitelist that e.g. allowed only //upload.wikimedia.org/ and //bits.wikimedia.org/ may not be problematic.
Matt Flaschen
I agree that a simple whitelist might be workable, but it does depend on a bit of code auditing of librsvg to ensure that it can be done robustly. --scott
On 05/27/2014 09:05 PM, C. Scott Ananian wrote:
I agree that a simple whitelist might be workable, but it does depend on a bit of code auditing of librsvg to ensure that it can be done robustly.
That works to protect the image scalers, if correct, but it does nothing to protect the clients, would it?
-- Marc
On 05/27/2014 09:09 PM, Marc A. Pelletier wrote:
On 05/27/2014 09:05 PM, C. Scott Ananian wrote:
I agree that a simple whitelist might be workable, but it does depend on a bit of code auditing of librsvg to ensure that it can be done robustly.
That works to protect the image scalers, if correct, but it does nothing to protect the clients, would it?
If the SVG is blocked at upload time, other users will not be able to download it, so that would address anything that can be statically checked (e.g. URLs).
If you're referring to the long-running GET issue, we would have to see how browsers handle things (i.e. whether it just keeps loading, times it out, hangs the browser preventing you from closing the tab, etc.).
Matt Flaschen
How would this work for non-wmf wikis? what about executing JavaScript that is posted to a approved wiki? This would make XSS and a whole host of other problems a lot easier to do. So we whitelist commons.wikimedia.org whats stopping a user from making a user subpage with some JS code that executes something arbitrary? Leaving SVG without external media is honestly the best way of doing it. Would you really trust a file that can load just about anything it wants arbitrarily?
On Tue, May 27, 2014 at 9:05 PM, C. Scott Ananian cananian@wikimedia.orgwrote:
I agree that a simple whitelist might be workable, but it does depend on a bit of code auditing of librsvg to ensure that it can be done robustly. --scott
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 05/27/2014 09:11 PM, John wrote:
How would this work for non-wmf wikis?
It could be configurable, and default to only allowing content under the image upload path on the local wiki (if it's enabled at all).
what about executing JavaScript that is posted to a approved wiki? This would make XSS and a whole host of other problems a lot easier to do. So we whitelist commons.wikimedia.org whats stopping a user from making a user subpage with some JS code that executes something arbitrary?
I specifically said bits.wikimedia.org and upload.wikimedia.org (and not commons.wikimedia.org), neither of which host user JavaScript.
Matt Flaschen
On May 27, 2014 11:28 PM, "Matthew Flaschen" mflaschen@wikimedia.org wrote:
On 05/27/2014 09:11 PM, John wrote:
How would this work for non-wmf wikis?
It could be configurable, and default to only allowing content under the
image upload path on the local wiki (if it's enabled at all).
what about executing JavaScript that is posted to a approved wiki? This
would make XSS and a whole host of other
problems a lot easier to do. So we whitelist commons.wikimedia.org whats stopping a user from making a user subpage with some JS code that
executes
something arbitrary?
I specifically said bits.wikimedia.org and upload.wikimedia.org (and not
commons.wikimedia.org), neither of which host user JavaScript.
Matt Flaschen
Gadgets are on bits and they are user controlled. Ditto for mediawiki:common.js et al. (Unless you mean users as in non admins). I see no usecase from allowing from bits. If someone wants an extension asset they can upload it.
Personally i dont understand where this conversation is going
why is js even a concern for librsvg. Does it even support that? Surely xss isnt the main concern about fetching random files from the image scalers.
Im not really familar with what the threat case is for restricting svgs, but making image scalers not be able to access the wider network seems like it would reduce the attack surface significantly.
For the client side of things, i havent looked at svg validation code recently, but dont we have roughly the same security concerns both before and after this?
--bawolff
On 05/27/2014 10:52 PM, Brian Wolff wrote:
I specifically said bits.wikimedia.org and upload.wikimedia.org (and not
commons.wikimedia.org), neither of which host user JavaScript.
Matt Flaschen
Gadgets are on bits and they are user controlled. Ditto for mediawiki:common.js et al. (Unless you mean users as in non admins). I see no usecase from allowing from bits. If someone wants an extension asset they can upload it.
You're right, I was completely wrong about the user JavaScript. Actually, user scripts are on bits too. Conceivably, it could limit it to directories starting with static-..., but that starts getting complicated. It's probably safer to limit it to user-uploaded Commons files as you said.
Matt Flaschen
On Tue, May 27, 2014 at 10:10 PM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 05/27/2014 10:52 PM, Brian Wolff wrote:
I specifically said bits.wikimedia.org and upload.wikimedia.org (and not
commons.wikimedia.org), neither of which host user JavaScript.
Matt Flaschen
Gadgets are on bits and they are user controlled. Ditto for mediawiki:common.js et al. (Unless you mean users as in non admins). I see no usecase from allowing from bits. If someone wants an extension asset they can upload it.
You're right, I was completely wrong about the user JavaScript. Actually, user scripts are on bits too. Conceivably, it could limit it to directories starting with static-..., but that starts getting complicated. It's probably safer to limit it to user-uploaded Commons files as you said.
It *should* be difficult to get javascript to run inside an image-- you would have to find an element that we allow that interprets javascript source. If anyone comes up with a way, I'd be very interested in hearing about it. If the javascript is already in an svg, then it's much easier to get it to execute.
But overall it's much safer to just not allow it, which is why we currently don't.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, May 27, 2014 at 9:37 AM, "Christian Müller" cmue81@gmx.de wrote:
Hi,
a recent discussion in https://bugzilla.wikimedia.org/show_bug.cgi?id=65724#c3
revealed that parts of the SVG standard are deliberately broken on commons. While I see some reasons to not adhere fully to the standard, e.g. external resources might break over time, if they are moved or deleted, I don't feel it's good to break the standard as hard as it's done right now. It puts a burden on creators, on the principle of sharing within the wikimedia environment and overall, it's even technically inferior and leads or might lead to useless duplication of content.
I'm far more concerned about the security/privacy issues than concern about an external resource going away. The checks that you're hitting are likely the security checks we do on the svg.
The SVG standard defines an image element. The image resource is linked to using the xlink:href attribute. Optionally the image is embedded into the SVG using the https://en.wikipedia.org/wiki/Data_URI_scheme%5Bhttps://en.wikipedia.org/wik...] .
Combining SVGs with traditional bitmap images is useful in several ways: It allows creators sharing the way an image is manipulated and eases future modification that are hard to do or even impossible using traditional bitmap/photo editing. It basically has the same advantages that mash-up web content has over static content: Each layer or element can be modified individually without destroying the other elements. It's easy to see that a proper SVG is more to its potential users than a classig JPG or PNG with only one layer, being the result of all image operations.
These reasons point out the necessity for barrier-free access to the image element.
Currently, commons cripples this access layed out in the standard and originally implemented by "librsvg". It disables the handling of HTTP(S) resources. Users needing the same bitmap in more than one SVG are forces to base64-embed their source, and hence duplicate it, in each individual SVG. Indeed, there is quite some burden on creators and on wikimedia servers that duplicate lots of data right now and potentially even more in the future. Note that this duplication of data goes unnoticed by the routines specifically in place for bitmaps right now, that check uploads on MD5 collision and reject the upload on dup detection. Space might be cheap as long as donations are flowing, but reverting bad practice once it is common is harder than promoting good practice /now/ by adhering to the standard as closely as possible.
Therefore I advocate change to librsvg in one of the two ways layed out in comment 3 of the bug report given above and (re)support linking to external bitmaps in SVGs. Two strategies that come to mind to prevent disappearance of an external resource in the web are:
- cache external refs on thumbnail generation, check for updates on
external server on thumbnail re-generation
- allow external refs to images residing on wikimedia servers only
Point 2) should be considered the easiest implementation, 1) is harder to implement but gives even more freedom to SVG creators and would adhere more closely to SVG standard. However, another argument for 2) would be the licensing issue: It ensures that only images are linked to that have been properly licensed by commons users and the upload process (and if a license violation is detected and the linked-to bitmap removed from commons, the SVG using such a bitmap breaks gracefully).
Having our servers do arbitrary calls to external resources (option 1) isn't a realistic option from a security perspective. There are some fun poc svg files that abuse this to scan a server's dmz, attack other sites with sql injections, etc. Trusting an image library to correctly speak http without a memory corruption seems a little scary as well, but I'll admit I haven't looked at librsvg's code myself.
From a privacy perspective, we also don't want to allow the situation where
a reader's device is reaching out to a server that we don't control. So if someone includes a link to the original svg on a webpage, if there are any major browsers that will pull those resources in and let an attacker see the user's IP address, we shouldn't allow that... hmm, and now that I read the bug, I see this is firefox'es behavior in the image you uploaded. We probably want to block that behavior.
Allowing a whitelist of WMF domains via https may be possible. In general, the security checking we do on uploaded files is complex enough that I don't like adding another layer of specific checks and exceptions, but if we can find a relatively simple way to do it that maintains our security and privacy requirements, then I wouldn't stand in the way.
Regards, Christian
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
While we're talking about SVGs, I'll note that our current handling of the 'lang' property on image references is at odds with how HTML and browsers implement it. See https://bugzilla.wikimedia.org/show_bug.cgi?id=58920 for more details.
We can't support 'lang' in SVG served directly to browsers/renderers (which parsoid would like to do) directly until this issue is addressed. --scott
ps. WRT to bitmap embeds, I think the larger issue is whether images on common are expected to be self-contained or not. The current infrastructure assumes that they are. It might be worth thinking of the 'big picture' and allowing a more general dependency mechanism -- if an SVG depends on a bitmap hosted on commons, then thumbnails of that SVG should be invalidated/regenerated when the bitmap changes, etc.
Sent: Dienstag, 27. Mai 2014 um 21:21 Uhr From: "Chris Steipp" csteipp@wikimedia.org To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] SVG linking of external images/bitmaps - xlink:href should support http(s) resources On Tue, May 27, 2014 at 9:37 AM, "Christian Müller" cmue81@gmx.de wrote:
[..] Trusting an image library to correctly speak http without a memory corruption seems a little scary as well, but I'll admit I haven't looked at librsvg's code myself.
In any case, it'd be the image library to fix. Restricting access is an arguably crude workaround due to diffuse fears. It breaks the standard and makes technology less useful to its users.
[..], if there are any major browsers that will pull those resources in and let an attacker see the user's IP address, we shouldn't allow that... hmm, and now that I read the bug, I see this is firefox'es behavior in the image you uploaded. We probably want to block that behavior.
Yeah, Firefox's decision to adhere fully to the SVG standard is right imho, since it has to measure itself in compatibility tests with other browsers.
If WP decides to cripple the standard for security reasons, that's their beer, but please stop starting to cripple user browsers. Security of that is in the hand of users, they have to make the decision wich browser to use and whether that ought to be a security enhanced one with less standard compliance, or a full featured one like FF.
Allowing a whitelist of WMF domains via https may be possible. In general, the security checking we do on uploaded files is complex enough that I don't like adding another layer of specific checks and exceptions, but if we can find a relatively simple way to do it that maintains our security and privacy requirements, then I wouldn't stand in the way.
Ok, within WP scope, hosting external dep files on foreign servers is out of reach, security- and longlivety-wise - it seems everyone agrees on this.
Afai am concerned, two short-term achievable issues remain:
1) allow certain WMF domains via https for thumbnail generation and librsvg processing in general - this is to adhere to SVG standard, as long as dependant files remain in wikimedia universe. (Is there a chance for this to make it into 1.24git?)
2) fixing chunked upload to not bail out on chunks that are exclusively base64 encoded and hence make valid files that include this base64 chunk fail on upload - with an unusable error description.
Farther off might be the need to rethink part of the file infrastructure, to either broadly allow formats that are not self contained OR make a strong and reasoned decision against that and document it for wikipedians. This has been suggested here: http://lists.wikimedia.org/pipermail/wikitech-l/2014-May/076700.html
Regards, Christian
ps:
On Thu, Jun 19, 2014 at 11:15 PM, "Christian Müller" cmue81@gmx.de wrote:
Sent: Dienstag, 27. Mai 2014 um 21:21 Uhr From: "Chris Steipp" csteipp@wikimedia.org To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] SVG linking of external images/bitmaps - xlink:href should support http(s) resources On Tue, May 27, 2014 at 9:37 AM, "Christian Müller" cmue81@gmx.de wrote:
[..] Trusting an image library to correctly speak http without a memory corruption seems a little scary as well, but I'll admit I haven't looked at librsvg's code myself.
In any case, it'd be the image library to fix. Restricting access is an arguably crude workaround due to diffuse fears. It breaks the standard and makes technology less useful to its users.
[..], if there are any major browsers that will pull those resources in and let an attacker see the user's IP address, we shouldn't allow that... hmm, and now that I read the bug, I see this is firefox'es behavior in the image you uploaded. We probably want to block that behavior.
Yeah, Firefox's decision to adhere fully to the SVG standard is right imho, since it has to measure itself in compatibility tests with other browsers.
If WP decides to cripple the standard for security reasons, that's their beer, but please stop starting to cripple user browsers. Security of that is in the hand of users, they have to make the decision wich browser to use and whether that ought to be a security enhanced one with less standard compliance, or a full featured one like FF.
I meant that because those browsers are fully implementing the spec, MediaWiki needs to protect our users privacy in case that is used. We have no influence over Firefox development, and I agree, the browsers should implement the spec. We just need to ensure we are taking precautions in that context.
Allowing a whitelist of WMF domains via https may be possible. In general, the security checking we do on uploaded files is complex enough that I don't like adding another layer of specific checks and exceptions, but if we can find a relatively simple way to do it that maintains our security and privacy requirements, then I wouldn't stand in the way.
Ok, within WP scope, hosting external dep files on foreign servers is out of reach, security- and longlivety-wise - it seems everyone agrees on this.
Afai am concerned, two short-term achievable issues remain:
- allow certain WMF domains via https for thumbnail generation and librsvg processing in general - this is to adhere to SVG standard, as long as dependant files remain in wikimedia universe. (Is there a chance for this to make it into 1.24git?)
Like I said, if someone can find a simple way to do this, we can allow it in MediaWiki. If someone wants to work on it, one of the first steps is to get the security/privacy requirements defined (along with the function requirements, like cscott brought up in the reference below). Most have been brought up here or on that bug, but someone should distill those somewhere.
- fixing chunked upload to not bail out on chunks that are exclusively base64 encoded and hence make valid files that include this base64 chunk fail on upload - with an unusable error description.
This will unfortunately require a different approach to how we do stashed/chunked uploads. Currently, each chunk is actually available from the server as a file. So each piece has to be checked for xss vectors, which is why your chunks currently fail. The stash will need to be inaccessible to end users.
Farther off might be the need to rethink part of the file infrastructure, to either broadly allow formats that are not self contained OR make a strong and reasoned decision against that and document it for wikipedians. This has been suggested here: http://lists.wikimedia.org/pipermail/wikitech-l/2014-May/076700.html
Regards, Christian
ps:
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org