While reviewing some other code, I went in and started ripping up some of the file type & validity checks in MediaWiki's upload system, as they've been driving me nuts for some time.
One quick subproject was tossing in an XML well-formedness check for SVG files. For the curious, here's a report on the invalid files I encountered while testing this with files from Commons:
http://meta.wikimedia.org/wiki/SVG_validity_checks
-- brion vibber (brion @ wikimedia.org)
On 06/02/2008, Brion Vibber brion@wikimedia.org wrote:
While reviewing some other code, I went in and started ripping up some of the file type & validity checks in MediaWiki's upload system, as they've been driving me nuts for some time. One quick subproject was tossing in an XML well-formedness check for SVG files. For the curious, here's a report on the invalid files I encountered while testing this with files from Commons: http://meta.wikimedia.org/wiki/SVG_validity_checks
This is worth noting on mediawiki.org, really.
Of particular interest are invalid SVGs created by editing tools. I have a Bastard SVG From Hell I like to throw at things (I hope to have a copy I can release soon ;-) ) created by OmniGraffle. The W3C validator hates it. Inkscape, rsvg, Safari, WebKit, Opera, Firefox and Minefield all misrender it to a greater or lesser degree. (I've yet to throw it at Batik.) But it's an SVG created by an editing program in current use ...
I was surprised to see a bad SVG from Inkscape - does opening and saving it in the current stable Inkscape sanitise it?
How sanitisable are the bad SVGs you found? How automatable would a sanitisation process be, e.g. from a command-line invocation of Inkscape?
- d.
David Gerard wrote:
On 06/02/2008, Brion Vibber brion@wikimedia.org wrote:
While reviewing some other code, I went in and started ripping up some of the file type & validity checks in MediaWiki's upload system, as they've been driving me nuts for some time. One quick subproject was tossing in an XML well-formedness check for SVG files. For the curious, here's a report on the invalid files I encountered while testing this with files from Commons: http://meta.wikimedia.org/wiki/SVG_validity_checks
This is worth noting on mediawiki.org, really.
Of particular interest are invalid SVGs created by editing tools. I have a Bastard SVG From Hell I like to throw at things (I hope to have a copy I can release soon ;-) ) created by OmniGraffle.
Oooh ooh can I have a copy? Right now I only really care that we can pass it as well-formed XML and recognize it as SVG, but that's the sort of thing that's great to test. :D
The W3C validator hates it. Inkscape, rsvg, Safari, WebKit, Opera, Firefox and Minefield all misrender it to a greater or lesser degree. (I've yet to throw it at Batik.) But it's an SVG created by an editing program in current use ...
I was surprised to see a bad SVG from Inkscape - does opening and saving it in the current stable Inkscape sanitise it?
Technically it's invalid XML -- Inkscape should refuse to open it. :)
On my spot checks, Inkscape is willing to take the ones with undeclared namespaces (and saves them correctly, yay) but won't open the ones that are outright malformed (bad char encoding, bad element nesting).
How sanitisable are the bad SVGs you found? How automatable would a sanitisation process be, e.g. from a command-line invocation of Inkscape?
For some well-known typical prefixes (xlink, sodipodi, RDF) we could fairly easily insert a namespace declaration. For others we might have to give up. :)
The mystery 'ns:' one seems to be specific to Adobe Illustrator's export, for instance, and variously shows up as 'ns:' or 'ns0:' prefix in files I googled up.
Again, most likely the original files were fine, but some combination of editing manually or with other tools may have corrupted them.
-- brion
Brion Vibber wrote:
While reviewing some other code, I went in and started ripping up some of the file type & validity checks in MediaWiki's upload system, as they've been driving me nuts for some time. One quick subproject was tossing in an XML well-formedness check for SVG files. For the curious, here's a report on the invalid files I encountered while testing this with files from Commons: http://meta.wikimedia.org/wiki/SVG_validity_checks
Did you use a program ? Is some code on svn? When checking uploads, i only reviewed if the beginning seemed like SVG.
Right now I only really care that we can pass it as well-formed XML and recognize it as SVG, but that's the sort of thing that's great to test. :D
How does it work with multibyte (UCS-2) svgs? Even when they're good, Mediawiki shows needles instead of the thumbnail. Most problems were with people manually editing SVGs and having them saved with a wrong encoding (without fixing/adding the <?xml), as well as some not too used ways of writing them, such as using <!DOCTYPEs that i wasn't handling at the time.