For context: I've been working on replacing the html5 and jsdom modules (which depend on the native 'contextify' module) with the pure-javascript 'domino' implementation of DOM4.  This seems to be faster, cleaner, and fix some bug caused by jsdom's eccentric DOM handling.  Domino is (in my brief experience) more reliable and standards-compliant.

Here's a list of issues I came across in the process:

* There were 3 new failures in wt2html tests.  (There were also some new passes, so the number of correct tests increases on net.)  They are:

1) "expansion of multi-line templates in attribute values (bug 6255 sanity check 2)"
For reference, this test looks like:

!! test
Expansion of multi-line templates in attribute values (bug 6255 sanity check)
!! input
<div style="background:
#00FF00">-</div>
!! result
<div style="background: #00FF00">-</div>
!! end
!! test
Expansion of multi-line templates in attribute values (bug 6255 sanity check 2)
!! input
<div style="background: &#10;#00FF00">-</div>
!! result
<div style="background: &#10;#00FF00">-</div>
!! end

I'm not sure how this test ever passed in jsdom -- the inputs here are actually identical to an HTML parser, since hex-escape decoding happens very early.  But apparently the wikitext parser should defer processing of the &#10 somehow?  On the domino branch our HTML serialization now uses the upstream standard HTML5-serialization algorithm, which doesn't escape newlines.  (http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#serializing-html-fragments)  Note that the first test also involves whitespace normalization, which the PHP parser does (see https://www.mediawiki.org/wiki/Special:Code/MediaWiki/14689) but parsoid does not do.  (I've got a patch to do whitespace normalization in parsoid if there's interest, but it causes other tests to break.)

What's the plan to handle cases like this?  Is it really important to generate the &#10; in the output?

2) "Play a bit with r67090 and bug 3158"
This is a parsoid-only test which looks like:
!! test
Play a bit with r67090 and bug 3158
!! options
disabled
!! input
<div style="width:50% !important">&nbsp;</div>
<div style="width:50%&nbsp;!important">&nbsp;</div>
<div style="width:50%&#160;!important">&nbsp;</div>
<div style="border : solid;">&nbsp;</div>
!! result
<div style="width:50% !important">&nbsp;</div>
<div style="width:50% !important">&nbsp;</div>
<div style="width:50% !important">&nbsp;</div>
<div style="border&#160;: solid;">&nbsp;</div>
!! end

In standard HTML serialization, &#160; is encoded uniformly as &nbsp; so even if you wanted to be bug-compatible with the 'border :' style, you should be emitting a &nbsp; not a &#160; there.  The other two cases are whitespace normalization within attributes (again).  I'm guessing jsdom (incorrectly) did this by default whether you wanted it or not; you need to explicitly add attribute-normalization into the domino case if that's desired.  (But there's some other reason why the 'border :' case is failing now which needs to be chased down, unrelated to the &#160; vs &nbsp; issue.)

3) "Parsoid-only: Table with broken attribute value quoting on consecutive lines"
!! test
Parsoid-only: Table with broken attribute value quoting on consecutive lines
!! options
disabled
!! input
{|
| title="Hello world|Foo
| style="color:red|Bar
|}
!! result
<table>
<tr>
<td title="Hello world">Foo
</td><td style="color: red;">Bar
</td></tr></table>
!! end

jsdom used to insert the extraneous semicolon at the end of the 'style' attribute.  domino does not.  I believe this test case is broken and the extraneous semicolon should be removed.

* Other observed bugs & failures:
http://parsoid.wmflabs.org/en/Pi gives:
TypeError: Cannot assign to read only property 'ksrc' of #<KV>
    at AttributeExpander._returnAttributes (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/ext.core.AttributeExpander.js:71:20)
    at AttributeTransformManager.process (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:1017:8)
    at AttributeExpander.onToken (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/ext.core.AttributeExpander.js:46:7)
    at AsyncTokenTransformManager.transformTokens (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:568:17)
    at AsyncTokenTransformManager.onChunk (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:356:17)
    at SyncTokenTransformManager.EventEmitter.emit (events.js:96:17)
    at SyncTokenTransformManager.onChunk (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:904:7)
    at PegTokenizer.EventEmitter.emit (events.js:96:17)
    at PegTokenizer.process (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.tokenizer.peg.js:88:11)
    at ParserPipeline.process (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.parser.js:360:21)

http://localhost:8000/simple/Game gives:
starting parsing of Game
*********** ERROR: cs/s mismatch for node: A s: 3808; cs: 3821 ************
completed parsing of Game in 1491 ms

* [[File:]] tag parsing for images appears to be incomplete:
  a) alt= and class= are not parsed
  b) 'thumb' and 'right' should result in <img class="thumb tright" /> or some such, but there doesn't appear to be an indication of either option in the parsoid output.

* I'd like to see title and revision information in the <head>

* Interwiki links are not converted to relative links when the "interwiki" is actually the current wiki.  (Maybe this isn't really a bug.)

Let's discuss these a bit and I'll file bugzilla tickets for the bits we can agree are actually bugs. ;)
  --scott

--
                         ( http://cscott.net/ )