On Thu, Jul 31, 2008 at 3:48 PM, Brion Vibber brion@wikimedia.org wrote:
HTML 4 defines the contents of those elements as CDATA in the DTD, just like <br> and <img> are defined as having no content so there's no ambiguity when they're being interpreted by an HTML parser.
XHTML doesn't provide for that sort of declaration, since XML requires you to be able to parse a document without having a DTD ahead of time.
For compatibility of documents between both HTML and XHTML parsers, XHTML 1.0 recommends using linked resources if possible -- so there's no worry about how to escape contents -- or else using explicit
<![CDATA[...]]> sections in your <script> and <style> elements.
So in fact, a compliant HTML parser *would* parse the contents of <script> or <style> incorrectly, if it contained entities that were expected to be decoded? In that case my fix is wrong, and we should write up a Sanitizer::escapeCdata() and use that here (and elsewhere).