Aryeh Gregor wrote:
On Tue, Jul 7, 2009 at 2:37 AM, Remember the
dot<rememberthedot(a)gmail.com> wrote:
Why be cruel to our bot operators? XHTML is
simpler and more consistent than
tag soup HTML, and it's a lot easier to find a good XML parser than a good
HTML parser.
Because it will make the markup easier to read and write for humans,
and smaller. Things like leaving off superfluous closing elements do
not make for "tag soup". One of the great features of HTML 5 is that
it very carefully defines the text/html parsing model in painstaking
backward-compatible detail. For example, the description of unquoted
attributes is as follows:
Technically HTML 4 is pretty much the same in this regard; it's 100%
legitimate SGML and HTML 4 to skip implied opening and closing elements,
drop quotes on attribute values that are unambiguous, etc.
HTML 5 is a little better I think in that it specifies which SGML short
forms are required to be supported and which shouldn't (for instance few
browsers support this SGML short form: <b/this is some bold text/).
The primary advantage of the XML formulation is that you can parse the
document tree unambiguously *without* knowing the spec of the individual
markup -- omitting implied values means the consumer needs to know what
to expect.
Is this really a huge advantage when the impliable elements are
well-known as in HTML? I dunno.
It can cause problems when a new element with implied behavior is added,
as with WebKit's initial <canvas> implementation. (Apple implemented it
as allowing an implied empty element, whereas Mozilla requires you to
close it so it won't confuse parsers that don't know it should be empty
and thus closed immediately.)
But as long as new markup extensions are used unambiguously, HTML 5
should be no more ambiguous and just as extensible as the XML formulation.
-- brion