-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Simetrical wrote:
On Thu, Jul 31, 2008 at 3:48 PM, Brion Vibber
<brion(a)wikimedia.org> wrote:
HTML 4 defines the contents of those elements as
CDATA in the DTD, just
like <br> and <img> are defined as having no content so there's no
ambiguity when they're being interpreted by an HTML parser.
XHTML doesn't provide for that sort of declaration, since XML requires
you to be able to parse a document without having a DTD ahead of time.
For compatibility of documents between both HTML and XHTML parsers,
XHTML 1.0 recommends using linked resources if possible -- so there's no
worry about how to escape contents -- or else using explicit
<![CDATA[...]]> sections in your <script> and <style> elements.
So in fact, a compliant HTML parser *would* parse the contents of
<script> or <style> incorrectly, if it contained entities that were
expected to be decoded?
Right... I just did a quick test confirm. This file:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en"
dir="ltr">
<head>
<title>Test</title>
</head>
<body>
<script>
alert("&");
</script>
</body>
</html>
will display "&" when served as text/html and "&" when
served as
application/xhtml+xml.
In that case my fix is wrong, and we should
write up a Sanitizer::escapeCdata() and use that here (and elsewhere).
Icky... but perhaps no good way around it I guess. :)
The tricky bit is that for HTML mode you want to wrap the "<![CDATA["
and "]]>" bits in comments (/* blah */) so they don't interfere with the
JS or CSS code.
Does that necessarily generally work? Bah, this shouldn't be so
annoyingly hard...
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkiSO+0ACgkQwRnhpk1wk44DSgCgvlRHqusLo6wnqI0HgCXj7ywJ
vRAAn0x2jpS1cHYhhD0UbsqsDabhuZzn
=FjJ7
-----END PGP SIGNATURE-----