As devotees of web standards are aware, HTML5 is no longer an XML variant (nor is it SGML).
This occasionally leads to fun times in Visual Editor and Parsoid land, as we try to work around various browser incompatibilities to ensure that documents are parsed consistently. Parsoid uses an HTML5 parser, but it uses its own non-HTML5-spec serializer (ie, not document.body.outerHTML) in order to emit XML-compatible documents that work around certain browser bugs (and use intelligent quoting to reduce document size). Visual Editor tries to parse parsoid output using the browser's XML serializer due to bugs in Internet Explorer (I believe) and then fixes up the output to match the HTML5 parser spec for <pre> tags. I'm not sure exactly how Visual Editor serializes its documents to send them back to Parsoid. I bet it's not quite the same way Parsoid serializes them.
In any case, I filed bugs with the W3C months ago to try to fix some of the specs. In particular, there is no official spec algorithm for serializing an HTML document as XML. That may now be fixed! See https://www.w3.org/Bugs/Public/show_bug.cgi?id=13410 (start at comment 13 if you are impatient).
It would probably be worth auditing VE and Parsoid's serialization algorithms to ensure that they are compatible with the new draft standard ( http://www.w3.org/TR/DOM-Parsing/#dfn-concept-xml-serialization-algorithm ), so that we can suggest improvements if we've got interesting corner cases and weird hacks that turn out to be needed for interoperability in the real world.
(And see also https://www.w3.org/Bugs/Public/show_bug.cgi?id=25225 -- it turns out that not even the HTML serializer API is completely defined in the spec, although `outerHTML` provides a means to get at the HTML fragment serializer. We had some issues with disappearing whitespace in the outer contexts of HTML documents as a result.) --scott
wikitech-l@lists.wikimedia.org