On Tue, May 27, 2014 at 9:37 AM, "Christian Müller" cmue81@gmx.de wrote:
Hi,
a recent discussion in https://bugzilla.wikimedia.org/show_bug.cgi?id=65724#c3
revealed that parts of the SVG standard are deliberately broken on commons. While I see some reasons to not adhere fully to the standard, e.g. external resources might break over time, if they are moved or deleted, I don't feel it's good to break the standard as hard as it's done right now. It puts a burden on creators, on the principle of sharing within the wikimedia environment and overall, it's even technically inferior and leads or might lead to useless duplication of content.
I'm far more concerned about the security/privacy issues than concern about an external resource going away. The checks that you're hitting are likely the security checks we do on the svg.
The SVG standard defines an image element. The image resource is linked to using the xlink:href attribute. Optionally the image is embedded into the SVG using the https://en.wikipedia.org/wiki/Data_URI_scheme%5Bhttps://en.wikipedia.org/wik...] .
Combining SVGs with traditional bitmap images is useful in several ways: It allows creators sharing the way an image is manipulated and eases future modification that are hard to do or even impossible using traditional bitmap/photo editing. It basically has the same advantages that mash-up web content has over static content: Each layer or element can be modified individually without destroying the other elements. It's easy to see that a proper SVG is more to its potential users than a classig JPG or PNG with only one layer, being the result of all image operations.
These reasons point out the necessity for barrier-free access to the image element.
Currently, commons cripples this access layed out in the standard and originally implemented by "librsvg". It disables the handling of HTTP(S) resources. Users needing the same bitmap in more than one SVG are forces to base64-embed their source, and hence duplicate it, in each individual SVG. Indeed, there is quite some burden on creators and on wikimedia servers that duplicate lots of data right now and potentially even more in the future. Note that this duplication of data goes unnoticed by the routines specifically in place for bitmaps right now, that check uploads on MD5 collision and reject the upload on dup detection. Space might be cheap as long as donations are flowing, but reverting bad practice once it is common is harder than promoting good practice /now/ by adhering to the standard as closely as possible.
Therefore I advocate change to librsvg in one of the two ways layed out in comment 3 of the bug report given above and (re)support linking to external bitmaps in SVGs. Two strategies that come to mind to prevent disappearance of an external resource in the web are:
- cache external refs on thumbnail generation, check for updates on
external server on thumbnail re-generation
- allow external refs to images residing on wikimedia servers only
Point 2) should be considered the easiest implementation, 1) is harder to implement but gives even more freedom to SVG creators and would adhere more closely to SVG standard. However, another argument for 2) would be the licensing issue: It ensures that only images are linked to that have been properly licensed by commons users and the upload process (and if a license violation is detected and the linked-to bitmap removed from commons, the SVG using such a bitmap breaks gracefully).
Having our servers do arbitrary calls to external resources (option 1) isn't a realistic option from a security perspective. There are some fun poc svg files that abuse this to scan a server's dmz, attack other sites with sql injections, etc. Trusting an image library to correctly speak http without a memory corruption seems a little scary as well, but I'll admit I haven't looked at librsvg's code myself.
From a privacy perspective, we also don't want to allow the situation where
a reader's device is reaching out to a server that we don't control. So if someone includes a link to the original svg on a webpage, if there are any major browsers that will pull those resources in and let an attacker see the user's IP address, we shouldn't allow that... hmm, and now that I read the bug, I see this is firefox'es behavior in the image you uploaded. We probably want to block that behavior.
Allowing a whitelist of WMF domains via https may be possible. In general, the security checking we do on uploaded files is complex enough that I don't like adding another layer of specific checks and exceptions, but if we can find a relatively simple way to do it that maintains our security and privacy requirements, then I wouldn't stand in the way.
Regards, Christian
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l