Aryeh Gregor wrote:
On Tue, Jul 7, 2009 at 2:37 AM, Remember the dotrememberthedot@gmail.com wrote:
Why be cruel to our bot operators? XHTML is simpler and more consistent than tag soup HTML, and it's a lot easier to find a good XML parser than a good HTML parser.
Because it will make the markup easier to read and write for humans, and smaller. Things like leaving off superfluous closing elements do not make for "tag soup". One of the great features of HTML 5 is that it very carefully defines the text/html parsing model in painstaking backward-compatible detail. For example, the description of unquoted attributes is as follows:
Technically HTML 4 is pretty much the same in this regard; it's 100% legitimate SGML and HTML 4 to skip implied opening and closing elements, drop quotes on attribute values that are unambiguous, etc.
HTML 5 is a little better I think in that it specifies which SGML short forms are required to be supported and which shouldn't (for instance few browsers support this SGML short form: <b/this is some bold text/).
The primary advantage of the XML formulation is that you can parse the document tree unambiguously *without* knowing the spec of the individual markup -- omitting implied values means the consumer needs to know what to expect.
Is this really a huge advantage when the impliable elements are well-known as in HTML? I dunno.
It can cause problems when a new element with implied behavior is added, as with WebKit's initial <canvas> implementation. (Apple implemented it as allowing an implied empty element, whereas Mozilla requires you to close it so it won't confuse parsers that don't know it should be empty and thus closed immediately.)
But as long as new markup extensions are used unambiguously, HTML 5 should be no more ambiguous and just as extensible as the XML formulation.
-- brion