On Tue, May 27, 2014 at 9:37 AM, "Christian Müller" <cmue81(a)gmx.de>
wrote:
Hi,
a recent discussion in
https://bugzilla.wikimedia.org/show_bug.cgi?id=65724#c3
revealed that parts of the SVG standard are deliberately broken on
commons. While I see some reasons to not adhere fully to the standard,
e.g. external resources might break over time, if they are moved or
deleted, I don't feel it's good to break the standard as hard as it's done
right now. It puts a burden on creators, on the principle of sharing
within the wikimedia environment and overall, it's even technically
inferior and leads or might lead to useless duplication of content.
I'm far more concerned about the security/privacy issues than concern about
an external resource going away. The checks that you're hitting are likely
the security checks we do on the svg.
The SVG standard defines an image element. The image resource is linked
to using the xlink:href attribute. Optionally the image is embedded into
the SVG using the
https://en.wikipedia.org/wiki/Data_URI_scheme[https://en.wikipedia.org/wiki…
.
Combining SVGs with traditional bitmap images is useful in several ways:
It allows creators sharing the way an image is manipulated and eases future
modification that are hard to do or even impossible using traditional
bitmap/photo editing. It basically has the same advantages that mash-up
web content has over static content: Each layer or element can be modified
individually without destroying the other elements. It's easy to see that
a proper SVG is more to its potential users than a classig JPG or PNG with
only one layer, being the result of all image operations.
These reasons point out the necessity for barrier-free access to the image
element.
Currently, commons cripples this access layed out in the standard and
originally implemented by "librsvg". It disables the handling of HTTP(S)
resources. Users needing the same bitmap in more than one SVG are forces
to base64-embed their source, and hence duplicate it, in each individual
SVG. Indeed, there is quite some burden on creators and on wikimedia
servers that duplicate lots of data right now and potentially even more in
the future. Note that this duplication of data goes unnoticed by the
routines specifically in place for bitmaps right now, that check uploads on
MD5 collision and reject the upload on dup detection. Space might be cheap
as long as donations are flowing, but reverting bad practice once it is
common is harder than promoting good practice /now/ by adhering to the
standard as closely as possible.
Therefore I advocate change to librsvg in one of the two ways layed out in
comment 3 of the bug report given above and (re)support linking to external
bitmaps in SVGs. Two strategies that come to mind to prevent disappearance
of an external resource in the web are:
1) cache external refs on thumbnail generation, check for updates on
external server on thumbnail re-generation
2) allow external refs to images residing on wikimedia servers only
Point 2) should be considered the easiest implementation, 1) is harder to
implement but gives even more freedom to SVG creators and would adhere more
closely to SVG standard. However, another argument for 2) would be the
licensing issue: It ensures that only images are linked to that have been
properly licensed by commons users and the upload process (and if a license
violation is detected and the linked-to bitmap removed from commons, the
SVG using such a bitmap breaks gracefully).
Having our servers do arbitrary calls to external resources (option 1)
isn't a realistic option from a security perspective. There are some fun
poc svg files that abuse this to scan a server's dmz, attack other sites
with sql injections, etc. Trusting an image library to correctly speak http
without a memory corruption seems a little scary as well, but I'll admit I
haven't looked at librsvg's code myself.
From a privacy perspective, we also don't want to
allow the situation where
a reader's device is reaching out to a server that we
don't control. So if
someone includes a link to the original svg on a webpage, if there are any
major browsers that will pull those resources in and let an attacker see
the user's IP address, we shouldn't allow that... hmm, and now that I read
the bug, I see this is firefox'es behavior in the image you uploaded. We
probably want to block that behavior.
Allowing a whitelist of WMF domains via https may be possible. In general,
the security checking we do on uploaded files is complex enough that I
don't like adding another layer of specific checks and exceptions, but if
we can find a relatively simple way to do it that maintains our security
and privacy requirements, then I wouldn't stand in the way.
Regards,
Christian
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l