On 07/25/2013 01:03 PM, Roan Kattouw wrote:
On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian cananian@wikimedia.org wrote:
For what it's worth, both the DOM serialization-to-a-string and DOM parsing-from-a-string are done with the domino package. It has a substantial test suite of its own (originally from http://www.w3.org/html/wg/wiki/Testing I believe). So although the above is probably worth doing as a low-priority task, it's really a test of the third-party library, not of Parsoid. (Although, since I'm a co-maintainer of domino, I'd be very interested in fixing any bugs which it did turn up.)
I didn't mean it as a test of Domino, I meant it as a test of Parsoid: does it generate things that are then foster-parented out, or other things that a compliant DOM parser won't round-trip? It's also a more realistic test, because the way that Parsoid is actually used by VE in practice is that it serializes its DOM, sends it over the wire to VE, which then does things with it and gives an HTML string back, which is then parsed through Domino. So even in normal operation, ignoring the fact that VE runs stuff through the browser's DOM parser, Parsoid itself already round-trips the HTML through Domino, effectively.
We use two different libraries for different things:
* html5 library for building a DOM from a tag soup * domino for serializing DOM --> HTML and for parsing HTML --> DOM
When doing a WT2WT roundtrip test, there are 2 ways to do this:
1. wikitext --> tag soup --> DOM (in-memory tree) --> wikitext 2. wikitext --> tag soup --> DOM (in-memory tree) --> HTML (string)--> DOM --> wikitext
We currently do 1. in our wt2wt testing. If there are foster-parenting bugs in the HTML5 library, then they will get hidden if we use path 1. However, when using VE and serializing its result back to wikitext, we are effectively using path 2.
And, both Roan and Scott are correct. Pathway 2. would be a test of of external libraries (HTML5 and Domino, not just domino). And, we did have bugs in the HTML5 parsing library we used (which I fixed based on reports from Roan) and then added them to parser tests.
But, if we use path 2. for all our RT testing for wp pages, other latent bugs with fostered content will show up.
Hope this clarifies the issue.
Subbu.
The foster parenting issues mostly arise in the wikitext->parsoid DOM phase. Basically, the wikitext is tokenized into a HTML tag soup and then a customized version of the standard HTML parser is used to assemble the soup into a DOM, mimicking the process by which a browser would parse the tag soup emitted by the current PHP parser. So the existing test suite does expose these foster-parenting issues already.
Does it really? There were a number of foster-parenting issues a few months ago where Parsoid inserted <meta> tags in places where they can't be put (e.g. <tr>s), and no one in the Parsoid team seemed to have noticed until I tracked down a few VE bugs to that problem.
Roan
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l