I've written a simple MediaWiki extension that uses an instance of the
W3C Validator service (via the Services_W3C_HTMLValidator
<http://pear.php.net/package/Services_W3C_HTMLValidator> PEAR package)
to validate SVG images hosted on a wiki. It is meant to replace the
current system on Commons, that relies on individual contributors adding
templates (e.g. InvalidSVG
<https://commons.wikimedia.org/wiki/Template:InvalidSVG>) by hand to
file description pages.
It exposes a simple API (and a Scribunto module as well) to get the
validation status of existing SVG files, can emit warnings when trying
to upload invalid ones, and is well integrated with MediaWiki's native
ObjectCache mechanism.
I'm in the process of publishing the code, but have some questions I
think the community could help me answer.
* Given that the W3C Validator can also parse HTML files, would it be
useful to validate wiki pages as well? Even if sometimes the
validation errors appear to be caused by MediaWiki itself, they can
also depend on malformed templates.
* Does storing the validation status of old revisions of images
(and/or articles) make sense?
* Do you think the extension should use the extmetadata property of
ApiQueryImageInfo instead of a its own module?
* Is it advisable to store validation data permanently in the database?