WikiDom serializers

List overview All Threads
Download

newer

older

"Watch me make mistakes"

citation generator

Gabriel Wicke

27 Oct 2011 27 Oct '11

10:38 p.m.

Hi,

today I started to look into generating something closer to WikiDom from the parser in the ParserPlayground extension. For further testing and parser development, changes to the structure will need to be mirrored in the current serializers and renderers, which likely won't be used very long once the editor integration gets underway.

The serializers developed in wikidom/lib/es seem to be just what would be needed, so I am wondering if it would make sense to put some effort into plugging those into the parser at this early stage while converting the parser output to WikiDom. The existing round-trip and parser testing infrastructure can then already be run against these serializers.

The split codebases make this a bit harder than necessary, so maybe this would also be a good time to draw up a rough plan on how the integration should look like. Swapping the serializers will soon break the existing ParserPlayground extension, so a move to another extension or the wikidom repository might make sense.

Looking forward to your thoughts,

Gabriel

Show replies by date

Trevor Parscal

4 Nov 4 Nov

10:33 p.m.

The other day I took the time to migrate the serializers into the latest codebase.

http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/VisualEditor/modu...

There are lots of things we can do to finish and or complete them, but they are a good way to hit the ground running.

Some key things that need to be worked on:

- es.HtmlSerializer needs an algorithm that expands the flat list items in WikiDom into a tree structure more suitable for HTML rendering - es.HtmlSerializer and es.WikitextSerializer need support for more things, namely definition lists but other gaps may exists as well (and we need to define what definition lists look like in WikiDom) - es.AnnotationSerializer needs some nesting smartness, so that overlapped regions open and close properly (abc should be abc - es.ContentView does this correctly but is working from the linear data model) - We need some sort of context that can be asked for the HTML of a template, whether a page exists, etc. Initially this work is all done on the client, which means this is a wrapper for a lot of API calls, but either way, having a firm API between the renderer and the site context will help keep things clean and flexible

The serializers depend on some static methods in es and es.Html - but are otherwise very stand-alone. We may even want to move es and es.Html (which are very general purpose libraries) to a shared library that the parser and es code can both depend on.

- Trevor

On Thu, Oct 27, 2011 at 1:38 PM, Gabriel Wicke wicke@wikidev.net wrote:

...

Hi,

today I started to look into generating something closer to WikiDom from the parser in the ParserPlayground extension. For further testing and parser development, changes to the structure will need to be mirrored in the current serializers and renderers, which likely won't be used very long once the editor integration gets underway.

The serializers developed in wikidom/lib/es seem to be just what would be needed, so I am wondering if it would make sense to put some effort into plugging those into the parser at this early stage while converting the parser output to WikiDom. The existing round-trip and parser testing infrastructure can then already be run against these serializers.

The split codebases make this a bit harder than necessary, so maybe this would also be a good time to draw up a rough plan on how the integration should look like. Swapping the serializers will soon break the existing ParserPlayground extension, so a move to another extension or the wikidom repository might make sense.

Looking forward to your thoughts,

Gabriel

Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Trevor Parscal

5 Nov 5 Nov

12:22 a.m.

As an update, definition lists are now supported, they are simply defined as "term" and "definition" in the WikiDom listItem styles attribute, in addition to the already supported "bullet" and "number" styles.

https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki...

It was disturbingly easy to add this functionality.

- Trevor

On Fri, Nov 4, 2011 at 2:33 PM, Trevor Parscal tparscal@wikimedia.orgwrote:

...

The other day I took the time to migrate the serializers into the latest codebase.

http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/VisualEditor/modu...

There are lots of things we can do to finish and or complete them, but they are a good way to hit the ground running.

Some key things that need to be worked on:

es.HtmlSerializer needs an algorithm that expands the flat list

items in WikiDom into a tree structure more suitable for HTML rendering

es.HtmlSerializer and es.WikitextSerializer need support for more

things, namely definition lists but other gaps may exists as well (and we need to define what definition lists look like in WikiDom)

es.AnnotationSerializer needs some nesting smartness, so that

overlapped regions open and close properly (abc should be abc - es.ContentView does this correctly but is working from the linear data model)

We need some sort of context that can be asked for the HTML of a

template, whether a page exists, etc. Initially this work is all done on the client, which means this is a wrapper for a lot of API calls, but either way, having a firm API between the renderer and the site context will help keep things clean and flexible

The serializers depend on some static methods in es and es.Html - but are otherwise very stand-alone. We may even want to move es and es.Html (which are very general purpose libraries) to a shared library that the parser and es code can both depend on.

Trevor

On Thu, Oct 27, 2011 at 1:38 PM, Gabriel Wicke wicke@wikidev.net wrote:

...
Hi,

today I started to look into generating something closer to WikiDom from the parser in the ParserPlayground extension. For further testing and parser development, changes to the structure will need to be mirrored in the current serializers and renderers, which likely won't be used very long once the editor integration gets underway.

The serializers developed in wikidom/lib/es seem to be just what would be needed, so I am wondering if it would make sense to put some effort into plugging those into the parser at this early stage while converting the parser output to WikiDom. The existing round-trip and parser testing infrastructure can then already be run against these serializers.

The split codebases make this a bit harder than necessary, so maybe this would also be a good time to draw up a rough plan on how the integration should look like. Swapping the serializers will soon break the existing ParserPlayground extension, so a move to another extension or the wikidom repository might make sense.

Looking forward to your thoughts,

Gabriel

Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Gabriel Wicke

7 Nov 7 Nov

7:59 p.m.

Hello Trevor and list,

since last week's integration in the VisualEditor extension, things are progressing well. Lists including definition lists and tables are parsed to WikiDom and rendered in the HTML serializer. The parser is still quite rough and in flux at this stage, but the general structures mostly work when running the parser tests using node.js. The Wikitext serializer is not yet wired up as I currently concentrate on the parser and its WikiDom output, but will be added for round-trip testing.

Apart from general grammar tweaking I am now working on a conversion of inline elements into WikiDom annotations. The main challenge is the calculation of plain-text offsets. I am trying to avoid building an intermediate structure, but might fall back to it if things get too messy when interleaving this calculation with parsing.

...

...

es.AnnotationSerializer needs some nesting smartness, so

that overlapped regions open and close properly (abc should be abc - es.ContentView does this correctly but is working from the linear data model)

Parsing these overlapped annotations is not supported too well right now, but should be doable using a multi-pass or shallow (token-only) parsing strategy for inline content. Pushing nesting and content model fix-ups to the serializer should also make it easier to approximate the parsing rules in the HTML5 specification [1] without forcing too much normalization. Mostly, the HTML5 parsing spec is a bit more systematic version of what tidy does right now after the MediaWiki parser has tried its best.

Some early fix-ups seem to be needed to allow proper editing in particular of block-level elements, so I am currently a bit sceptical about avoiding normalization completely.

...

...

We need some sort of context that can be asked for the HTML of a

template, whether a page exists, etc. Initially this work is all done on the client, which means this is a wrapper for a lot of API calls, but either way, having a firm API between the renderer and the site context will help keep things clean and flexible

Brion already implemented a simple context object for transclusion tests. This is probably not yet the final API, but already a good start.

Gabriel

[1]: HTML5 parsing spec: http://dev.w3.org/html5/spec/Overview.html#parsing

4783

Age (days ago)

4794

Last active (days ago)

wikitext-l@lists.wikimedia.org

3 comments

2 participants

tags (0)

participants (2)

Gabriel Wicke
Trevor Parscal