Awesome -- thanks!

I have a couple of questions to ask, too. These aren't urgent, so feel free to reply whenever you have the time.

1) Conformance vs. enhancement

I'm not sure if you saw my comment in the diff, but some of the ISBN validation code I wrote is deliberately unreachable because it's stricter than the naive validation the Mediawiki currently performs. (I stupidly wrote it before checking to see what exactly the current parser does.) In general, is it ever desirable to try and improve on the old parsing grammar or is total conformance the overriding goal?

2) ECMAScript target

Should I strive to write code that is compatible with older browsers? Thus far I've avoided ES 1.5 constructs like forEach, map, filter, etc. But if this is only going to run server-side under node for the foreseeable future, maybe that's an unnecessary handicap. 

3) Intermediate representation vs. parser hacks

The preliminary work I've done to parse behavior switches (__TOC__ & friends) has the parser directly toggle attributes on a configuration object. It does this rather than produce a token for some subsequent tree visitor to interpret. So the parsing is arguably somewhat lossy in this case, but possibly not: when serializing back wikitext, you could read all behaviors from the configuration object and encode them at the top; their position isn't significant, afaik.)

So: did I make the right decision? PEG.js's support for arbitrary JavaScript code in the grammar makes this tempting to do sometimes, but perhaps it's wiser to insist that the parser not overreach the goal of building a tree. What is your take?

I'm CCing wikitext-l, since your responses are likely to be useful to future contributors.

Best,
Ori

On Mon, Apr 2, 2012 at 12:41 PM, Gabriel Wicke <gwicke@wikimedia.org> wrote:
Hi Ori,

I committed your patch in https://gerrit.wikimedia.org/r/#change,4094. Looks good, and lets 14 more tests pass ;) Thanks!!

Gabriel


On Mon, Apr 2, 2012 at 9:59 AM, Ori Livneh <ori.livneh@gmail.com> wrote:
Hey Gabriel,

Working on this on stolen time, as it were, but I do have some modest progress to report. The attached patch revises the handling of RFC auto-links and adds support for ISBNs, PMID links, and preliminary support for behavior switches (things like __TOC__).

I applied for git access, so hopefully I'll have my own branch set up soon and we'll be able to do this in a slightly less haphazard way

Hope you've had a nice trip,

Ori