Hello Trevor and list,
since last week's integration in the VisualEditor extension, things are progressing well. Lists including definition lists and tables are parsed to WikiDom and rendered in the HTML serializer. The parser is still quite rough and in flux at this stage, but the general structures mostly work when running the parser tests using node.js. The Wikitext serializer is not yet wired up as I currently concentrate on the parser and its WikiDom output, but will be added for round-trip testing.
Apart from general grammar tweaking I am now working on a conversion of inline elements into WikiDom annotations. The main challenge is the calculation of plain-text offsets. I am trying to avoid building an intermediate structure, but might fall back to it if things get too messy when interleaving this calculation with parsing.
- es.AnnotationSerializer needs some nesting smartness, so
that overlapped regions open and close properly (<b>a<i>b</b>c</i> should be <b>a<i>b</i></b><i>c</i> - es.ContentView does this correctly but is working from the linear data model)
Parsing these overlapped annotations is not supported too well right now, but should be doable using a multi-pass or shallow (token-only) parsing strategy for inline content. Pushing nesting and content model fix-ups to the serializer should also make it easier to approximate the parsing rules in the HTML5 specification [1] without forcing too much normalization. Mostly, the HTML5 parsing spec is a bit more systematic version of what tidy does right now after the MediaWiki parser has tried its best.
Some early fix-ups seem to be needed to allow proper editing in particular of block-level elements, so I am currently a bit sceptical about avoiding normalization completely.
- We need some sort of context that can be asked for the HTML of a
template, whether a page exists, etc. Initially this work is all done on the client, which means this is a wrapper for a lot of API calls, but either way, having a firm API between the renderer and the site context will help keep things clean and flexible
Brion already implemented a simple context object for transclusion tests. This is probably not yet the final API, but already a good start.
Gabriel
[1]: HTML5 parsing spec: http://dev.w3.org/html5/spec/Overview.html#parsing