[Mediawiki-l] XHTML generation

Brion Vibber brion at pobox.com
Thu Feb 17 19:19:31 UTC 2005

NSK wrote:
> <a href="http://portal.wikinerds.org/canada-flag" class='external'
> title="http://portal.wikinerds.org/canada-flag"
> rel="nofollow">link</a>
> The XHTML above is generated by Wikipedia, but in my opinion MW should use
> only " and not ' i.e. it should be class="external" instead of 'external'

Thank you for your opinion. It will be dutifully studied, found wanting,
and discarded.

For the people in the audience who are interested in accuracy, note that
XML allows attribute values to be quoted either with double quotes or
single quotes. Here's the formal lexical definition[1]:

   AttValue ::=     '"' ([^<&"] | Reference)* '"'
                 |  "'" ([^<&'] | Reference)* "'"

Classic SGML-based HTML additionally allows attribute values to be
unquoted if they contain only certain characters[2] (eg border=1, but
*not* bgcolor=#EEEEEE which is actually illegal!) XHTML limits itself to
XML's stricter syntax, so only single and double-quoted attribute values
are allowed.

Since strings in PHP source code are themselves usually either single or
double-quoted, instances of the same quote character within the string
must be escaped with a backslash to appear as a literal character.
Convenience for the coder thus often prompts the use of one or the other
quote style for XHTML markup being produced from PHP code. Something
like "<p class='error'>$err</p>" is easier for the coders to read than
"<p class=\"error\">$err</p>".

When outputting user-supplied data which is escaped using
htmlspecialchars(), generally double-quotes are used as that function's
default behavior does transform " to &quot; but does not transform ',
requiring additional work to produce a string suitable for literal
inclusion in a single-quoted XML attribute.

Both quote forms are equally legal and produce equivalent results, so
source code readability tends to outweigh the minor issue of aesthetic
consistency in the markup output. Markup output is already butt-ugly
because it's not spaced or indented nicely, and nobody's going to look
at it very often; it's for consumption of the browser while the source
code is maintained by human programmers.

[1] http://www.w3.org/TR/REC-xml/#sec-common-syn
[2] http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2

-- brion vibber (brion @ pobox.com)
