On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian cananian@wikimedia.org wrote:
For what it's worth, both the DOM serialization-to-a-string and DOM parsing-from-a-string are done with the domino package. It has a substantial test suite of its own (originally from http://www.w3.org/html/wg/wiki/Testing I believe). So although the above is probably worth doing as a low-priority task, it's really a test of the third-party library, not of Parsoid. (Although, since I'm a co-maintainer of domino, I'd be very interested in fixing any bugs which it did turn up.)
I didn't mean it as a test of Domino, I meant it as a test of Parsoid: does it generate things that are then foster-parented out, or other things that a compliant DOM parser won't round-trip? It's also a more realistic test, because the way that Parsoid is actually used by VE in practice is that it serializes its DOM, sends it over the wire to VE, which then does things with it and gives an HTML string back, which is then parsed through Domino. So even in normal operation, ignoring the fact that VE runs stuff through the browser's DOM parser, Parsoid itself already round-trips the HTML through Domino, effectively.
The foster parenting issues mostly arise in the wikitext->parsoid DOM phase. Basically, the wikitext is tokenized into a HTML tag soup and then a customized version of the standard HTML parser is used to assemble the soup into a DOM, mimicking the process by which a browser would parse the tag soup emitted by the current PHP parser. So the existing test suite does expose these foster-parenting issues already.
Does it really? There were a number of foster-parenting issues a few months ago where Parsoid inserted <meta> tags in places where they can't be put (e.g. <tr>s), and no one in the Parsoid team seemed to have noticed until I tracked down a few VE bugs to that problem.
Roan