On Wed, Jul 24, 2013 at 11:20 AM, Subramanya Sastry
<ssastry(a)wikimedia.org>wrote;wrote:
On 07/24/2013 09:58 AM, Roan Kattouw wrote:
There are a few things I wish it tested, but
they're mostly about how it
tests things rather than what data is collected. For instance, it would be
nice if the round-trip tests could round-trip from wikitext to HTML
*string* and back, rather than to HTML *DOM* and back. This would help
catch cases where the DOM doesn't cleanly round-trip through the HTML
parser (foster-parenting for instance). It may be that this is already
implemented, or that it was considered and rejected, I don't know.
Yes, we've considered this for a while now. Just not done yet since we
haven't had a chance to work on the testing infrastructure in over 6 months
till now.
For what it's worth, both the DOM serialization-to-a-string and DOM
parsing-from-a-string are done with the domino package. It has a
substantial test suite of its own (originally from
http://www.w3.org/html/wg/wiki/Testing I believe). So although the above
is probably worth doing as a low-priority task, it's really a test of the
third-party library, not of Parsoid. (Although, since I'm a co-maintainer
of domino, I'd be very interested in fixing any bugs which it did turn up.)
The foster parenting issues mostly arise in the wikitext->parsoid DOM
phase. Basically, the wikitext is tokenized into a HTML tag soup and then
a customized version of the standard HTML parser is used to assemble the
soup into a DOM, mimicking the process by which a browser would parse the
tag soup emitted by the current PHP parser. So the existing test suite
does expose these foster-parenting issues already.
--scott
--
(
http://cscott.net)