I've written a simple MediaWiki extension that uses an instance of the W3C Validator service (via the Services_W3C_HTMLValidator http://pear.php.net/package/Services_W3C_HTMLValidator PEAR package) to validate SVG images hosted on a wiki. It is meant to replace the current system on Commons, that relies on individual contributors adding templates (e.g. InvalidSVG https://commons.wikimedia.org/wiki/Template:InvalidSVG) by hand to file description pages. It exposes a simple API (and a Scribunto module as well) to get the validation status of existing SVG files, can emit warnings when trying to upload invalid ones, and is well integrated with MediaWiki's native ObjectCache mechanism. I'm in the process of publishing the code, but have some questions I think the community could help me answer.
* Given that the W3C Validator can also parse HTML files, would it be useful to validate wiki pages as well? Even if sometimes the validation errors appear to be caused by MediaWiki itself, they can also depend on malformed templates. * Does storing the validation status of old revisions of images (and/or articles) make sense? * Do you think the extension should use the extmetadata property of ApiQueryImageInfo instead of a its own module? * Is it advisable to store validation data permanently in the database?
Il 17/12/2014 17:57, Ricordisamoa ha scritto:
I've written a simple MediaWiki extension that uses an instance of the W3C Validator service (via the Services_W3C_HTMLValidator http://pear.php.net/package/Services_W3C_HTMLValidator PEAR package) to validate SVG images hosted on a wiki. It is meant to replace the current system on Commons, that relies on individual contributors adding templates (e.g. InvalidSVG https://commons.wikimedia.org/wiki/Template:InvalidSVG) by hand to file description pages. It exposes a simple API (and a Scribunto module as well) to get the validation status of existing SVG files, can emit warnings when trying to upload invalid ones, and is well integrated with MediaWiki's native ObjectCache mechanism. I'm in the process of publishing the code, but have some questions I think the community could help me answer.
- Given that the W3C Validator can also parse HTML files, would it be useful to validate wiki pages as well? Even if sometimes the validation errors appear to be caused by MediaWiki itself, they can also depend on malformed templates.
- Does storing the validation status of old revisions of images (and/or articles) make sense?
- Do you think the extension should use the extmetadata property of ApiQueryImageInfo instead of a its own module?
- Is it advisable to store validation data permanently in the database?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Also, could validation results be exposed from the 'file' key of the mw.title table instead of a custom module?
Anybody interested?
Il 30/12/2014 10:11, Ricordisamoa ha scritto:
Il 17/12/2014 17:57, Ricordisamoa ha scritto:
I've written a simple MediaWiki extension that uses an instance of the W3C Validator service (via the Services_W3C_HTMLValidator http://pear.php.net/package/Services_W3C_HTMLValidator PEAR package) to validate SVG images hosted on a wiki. It is meant to replace the current system on Commons, that relies on individual contributors adding templates (e.g. InvalidSVG https://commons.wikimedia.org/wiki/Template:InvalidSVG) by hand to file description pages. It exposes a simple API (and a Scribunto module as well) to get the validation status of existing SVG files, can emit warnings when trying to upload invalid ones, and is well integrated with MediaWiki's native ObjectCache mechanism. I'm in the process of publishing the code, but have some questions I think the community could help me answer.
- Given that the W3C Validator can also parse HTML files, would it be useful to validate wiki pages as well? Even if sometimes the validation errors appear to be caused by MediaWiki itself, they can also depend on malformed templates.
- Does storing the validation status of old revisions of images (and/or articles) make sense?
- Do you think the extension should use the extmetadata property of ApiQueryImageInfo instead of a its own module?
- Is it advisable to store validation data permanently in the database?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Also, could validation results be exposed from the 'file' key of the mw.title table instead of a custom module?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Dec 17, 2014 at 11:57 AM, Ricordisamoa <ricordisamoa@openmailbox.org
wrote:
- Given that the W3C Validator can also parse HTML files, would it be useful to validate wiki pages as well? Even if sometimes the validation errors appear to be caused by MediaWiki itself, they can also depend on malformed templates.
Meh not sure how useful this would be. Maybe as a developer tool, but not something you would want running on your site. The SVG validation tool makes sense because you're validating user input.
- Does storing the validation status of old revisions of images (and/or articles) make sense?
I don't think there's any harm in it. Better to have extraneous information then to wait until users complain about not having it down the line.
- Do you think the extension should use the extmetadata property of ApiQueryImageInfo instead of a its own module?
- Is it advisable to store validation data permanently in the database?
I have no idea about this, but it does seem that the metadata is propagated to the oldimage table when a new one is uploaded, so it would fulfill your above question about storing old revisions' validation status.
Exactly what information is being stored? Is it just a flag that says valid or not valid? Is it a list of errors and warnings? If so what format is it all in?
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science
Il 19/01/2015 19:00, Tyler Romeo ha scritto:
On Wed, Dec 17, 2014 at 11:57 AM, Ricordisamoa <ricordisamoa@openmailbox.org
wrote:
- Given that the W3C Validator can also parse HTML files, would it be useful to validate wiki pages as well? Even if sometimes the validation errors appear to be caused by MediaWiki itself, they can also depend on malformed templates.
Meh not sure how useful this would be. Maybe as a developer tool, but not something you would want running on your site. The SVG validation tool makes sense because you're validating user input.
- Does storing the validation status of old revisions of images (and/or articles) make sense?
I don't think there's any harm in it. Better to have extraneous information then to wait until users complain about not having it down the line.
- Do you think the extension should use the extmetadata property of ApiQueryImageInfo instead of a its own module?
- Is it advisable to store validation data permanently in the database?
I have no idea about this, but it does seem that the metadata is propagated to the oldimage table when a new one is uploaded, so it would fulfill your above question about storing old revisions' validation status.
Exactly what information is being stored? Is it just a flag that says valid or not valid? Is it a list of errors and warnings? If so what format is it all in?
They're instances of Services_W3C_HTMLValidator_Response, as returned by Services_W3C_HTMLValidator https://github.com/pear/Services_W3C_HTMLValidator, containing the boolean validity, the lists of errors and warnings with machine-readable information, etc., stored in the object cache. When requested via the API, they are converted to plain arrays.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Mon, Jan 19, 2015 at 10:00 AM, Tyler Romeo tylerromeo@gmail.com wrote:
- Do you think the extension should use the extmetadata property of ApiQueryImageInfo instead of a its own module?
- Is it advisable to store validation data permanently in the database?
I have no idea about this, but it does seem that the metadata is propagated to the oldimage table when a new one is uploaded, so it would fulfill your above question about storing old revisions' validation status.
metadata is data generated from the file. It has built-in storage and invalidation mechanisms that are based on file upload / purge. extmetadata is assumed to come from some other source, and providers need to handle invalidation (and permanent storage, if desirable) manually. Thus, metadata would be a better fit in theory, but I don't think it offers any mechanism currently for extensions to hook into it. So I don't think you are any worse off by writing a self-contained extension.
Il 21/01/2015 05:27, Gergo Tisza ha scritto:
On Mon, Jan 19, 2015 at 10:00 AM, Tyler Romeo tylerromeo@gmail.com wrote:
- Do you think the extension should use the extmetadata property of ApiQueryImageInfo instead of a its own module?
- Is it advisable to store validation data permanently in the database?
I have no idea about this, but it does seem that the metadata is propagated to the oldimage table when a new one is uploaded, so it would fulfill your above question about storing old revisions' validation status.
metadata is data generated from the file. It has built-in storage and invalidation mechanisms that are based on file upload / purge. extmetadata is assumed to come from some other source, and providers need to handle invalidation (and permanent storage, if desirable) manually. Thus, metadata would be a better fit in theory, but I don't think it offers any mechanism currently for extensions to hook into it. So I don't think you are any worse off by writing a self-contained extension.
There is the GetExtendedMetadata hook https://www.mediawiki.org/wiki/Manual:Hooks/GetExtendedMetadata but, according to the documentation https://www.mediawiki.org/wiki/API:Properties#imageinfo_.2F_ii, the data should be in HTML format.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I was supposing that the Wikimedia Foundation would be willing to run the W3C Validator on their servers. Now I ask: is it feasible?
Il 17/12/2014 17:57, Ricordisamoa ha scritto:
I've written a simple MediaWiki extension that uses an instance of the W3C Validator service (via the Services_W3C_HTMLValidator http://pear.php.net/package/Services_W3C_HTMLValidator PEAR package) to validate SVG images hosted on a wiki. It is meant to replace the current system on Commons, that relies on individual contributors adding templates (e.g. InvalidSVG https://commons.wikimedia.org/wiki/Template:InvalidSVG) by hand to file description pages. It exposes a simple API (and a Scribunto module as well) to get the validation status of existing SVG files, can emit warnings when trying to upload invalid ones, and is well integrated with MediaWiki's native ObjectCache mechanism. I'm in the process of publishing the code, but have some questions I think the community could help me answer.
- Given that the W3C Validator can also parse HTML files, would it be useful to validate wiki pages as well? Even if sometimes the validation errors appear to be caused by MediaWiki itself, they can also depend on malformed templates.
- Does storing the validation status of old revisions of images (and/or articles) make sense?
- Do you think the extension should use the extmetadata property of ApiQueryImageInfo instead of a its own module?
- Is it advisable to store validation data permanently in the database?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
For the record: this issue has been discussed yesterday on #wikimedia-dev. Logs are available here http://bots.wmflabs.org/%7Ewm-bot/logs/%23wikimedia-dev/20150417.txt (starting at 22:48:15).
Il 26/02/2015 10:40, Ricordisamoa ha scritto:
I was supposing that the Wikimedia Foundation would be willing to run the W3C Validator on their servers. Now I ask: is it feasible?
Il 17/12/2014 17:57, Ricordisamoa ha scritto:
I've written a simple MediaWiki extension that uses an instance of the W3C Validator service (via the Services_W3C_HTMLValidator http://pear.php.net/package/Services_W3C_HTMLValidator PEAR package) to validate SVG images hosted on a wiki. It is meant to replace the current system on Commons, that relies on individual contributors adding templates (e.g. InvalidSVG https://commons.wikimedia.org/wiki/Template:InvalidSVG) by hand to file description pages. It exposes a simple API (and a Scribunto module as well) to get the validation status of existing SVG files, can emit warnings when trying to upload invalid ones, and is well integrated with MediaWiki's native ObjectCache mechanism. I'm in the process of publishing the code, but have some questions I think the community could help me answer.
- Given that the W3C Validator can also parse HTML files, would it be useful to validate wiki pages as well? Even if sometimes the validation errors appear to be caused by MediaWiki itself, they can also depend on malformed templates.
- Does storing the validation status of old revisions of images (and/or articles) make sense?
- Do you think the extension should use the extmetadata property of ApiQueryImageInfo instead of a its own module?
- Is it advisable to store validation data permanently in the database?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org