I would also recommend against actively trying to emit barely parsing output. Any savings after compression should be rather small, and if only end tags are omitted the DOM will of course still be the same size after parsing.
In Parsoid we went to some modest lengths https://github.com/wikimedia/parsoid/blob/master/lib/XMLSerializer.js to produce polyglot markup http://www.w3.org/TR/html-polyglot/, which is both valid XML and HTML5. This has enabled consumers to use either XML or HTML5 parsers, which has proven very useful in practice. For example, this makes it easier to consume this content using PHP's libxml. Doing the same in MediaWiki core is admittedly harder, but I still think that we should follow the robustness principle https://en.wikipedia.org/wiki/Robustness_principle wherever we can.
Gabriel
On Wed, Feb 18, 2015 at 5:59 PM, Tim Starling tstarling@wikimedia.org wrote:
On 19/02/15 08:43, Gergo Tisza wrote:
On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benapetr@gmail.com wrote:
(Perhaps wgWellFormedXml is true by default?)
It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml
There was a Bugzilla report and Gerrit change requesting that it be set to false:
https://phabricator.wikimedia.org/T52040 https://gerrit.wikimedia.org/r/#/c/70036/
I was against it, partly because of the omitted <head> tag.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l