On Wed, Jul 24, 2013 at 11:20 AM, Subramanya Sastry ssastry@wikimedia.orgwrote:
On 07/24/2013 09:58 AM, Roan Kattouw wrote:
There are a few things I wish it tested, but they're mostly about how it tests things rather than what data is collected. For instance, it would be nice if the round-trip tests could round-trip from wikitext to HTML *string* and back, rather than to HTML *DOM* and back. This would help catch cases where the DOM doesn't cleanly round-trip through the HTML parser (foster-parenting for instance). It may be that this is already implemented, or that it was considered and rejected, I don't know.
Yes, we've considered this for a while now. Just not done yet since we haven't had a chance to work on the testing infrastructure in over 6 months till now.
For what it's worth, both the DOM serialization-to-a-string and DOM parsing-from-a-string are done with the domino package. It has a substantial test suite of its own (originally from http://www.w3.org/html/wg/wiki/Testing I believe). So although the above is probably worth doing as a low-priority task, it's really a test of the third-party library, not of Parsoid. (Although, since I'm a co-maintainer of domino, I'd be very interested in fixing any bugs which it did turn up.)
The foster parenting issues mostly arise in the wikitext->parsoid DOM phase. Basically, the wikitext is tokenized into a HTML tag soup and then a customized version of the standard HTML parser is used to assemble the soup into a DOM, mimicking the process by which a browser would parse the tag soup emitted by the current PHP parser. So the existing test suite does expose these foster-parenting issues already. --scott