I'd say an HTML5 output mode *ought* to work like this:
*Don't try to be clever.* * Consistency and predictability are key to both security review and data consumability.
*Quote attributes consistently and predictably.* * Always use double-quotes on attributes in output.
*Output specced empty tags in HTML style.* * <img>, <hr>, <br> are fine and not ambiguous at all to an HTML parser. There's no need to go adding a "/" in at the end! * These are already whitelisted in the Html class so it's easy to not mess this up.
*Don't do other silly things for old-school XHTML 1.* * CDATA wrapping of <script>s and <style>s is not needed.
The only benefit of $wgWellFormedXml was that you could toss your "well-formed" tag soup into an XML parser that didn't grok HTML. I have no idea if that worked reliably or was actually useful to anyone, but it's probably worth confirming that before actually removing the funky self-closing tags.
-- brion
On Mon, May 2, 2016 at 11:42 AM, Brian Wolff bawolff@gmail.com wrote:
So currently, we have two ways of outputting html - $wgWellFormedXml = true (The default), outputs html that happens to conform with the rules of XML. $wgWellFormedXml = false on the other hand, uses more lax html5 rules to save a few bytes.
Having two modes of output, feels rather silly to me. Originally I think this was meant as a feature flag well $wgWellFormedXml=false stabilized, but it never got turned on, and here we are 7 years later.
Having $wgWellFormedXml=false increases the complexity of the code, and not all that many people use it (Notable exception is translatewiki). I think its important that security critical code be as simple as possible. Furthermore, there seems to be very little benefit to having the second mode (After you account for gzip, saving a few bytes from writing <img> instead of <img/> really doesn't matter, imo)
With that in mind, I would like to propose killing $wgWellFormedXml = false; I'm not so much attached to the true mode (Although I do feel the true mode is significantly more sane), as I just simply want there to be a single mode. Putting the default to false was vetoed in T52040, so I think that true would be the best choice to go with going forward if we are getting rid of one of the modes.
If there are aspects of the other mode that people really want, then I think we should simply merge that in to the default behavior instead of having two separate modes.
See gerrit patch https://gerrit.wikimedia.org/r/286495 I would appreciate everyone's feedback.
Thanks, Brian
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l