Awesome -- thanks!
I have a couple of questions to ask, too. These aren't urgent, so feel free
to reply whenever you have the time.
1) Conformance vs. enhancement
I'm not sure if you saw my comment in the diff, but some of the ISBN
validation code I wrote is deliberately unreachable because it's stricter
than the naive validation the Mediawiki currently performs. (I stupidly
wrote it before checking to see what exactly the current parser does.) In
general, is it ever desirable to try and improve on the old parsing grammar
or is total conformance the overriding goal?
2) ECMAScript target
Should I strive to write code that is compatible with older browsers? Thus
far I've avoided ES 1.5 constructs like forEach, map, filter, etc. But if
this is only going to run server-side under node for
the foreseeable future, maybe that's an unnecessary handicap.
3) Intermediate representation vs. parser hacks
The preliminary work I've done to parse behavior switches (__TOC__ &
friends) has the parser directly toggle attributes on a configuration
object. It does this rather than produce a token for some subsequent tree
visitor to interpret. So the parsing is arguably somewhat lossy in this
case, but possibly not: when serializing back wikitext, you could read all
behaviors from the configuration object and encode them at the top; their
position isn't significant, afaik.)
So: did I make the right decision? PEG.js's support for arbitrary
JavaScript code in the grammar makes this tempting to do sometimes, but
perhaps it's wiser to insist that the parser not overreach the goal of
building a tree. What is your take?
I'm CCing wikitext-l, since your responses are likely to be useful to
future contributors.
Best,
Ori
On Mon, Apr 2, 2012 at 12:41 PM, Gabriel Wicke <gwicke(a)wikimedia.org> wrote:
Hi Ori,
I committed your patch in
https://gerrit.wikimedia.org/r/#change,4094.
Looks good, and lets 14 more tests pass ;) Thanks!!
Gabriel
On Mon, Apr 2, 2012 at 9:59 AM, Ori Livneh <ori.livneh(a)gmail.com> wrote:
Hey Gabriel,
Working on this on stolen time, as it were, but I do have some modest
progress to report. The attached patch revises the handling of RFC
auto-links and adds support for ISBNs, PMID links, and preliminary support
for behavior switches (things like __TOC__).
I applied for git access, so hopefully I'll have my own branch set up
soon and we'll be able to do this in a slightly less haphazard way
Hope you've had a nice trip,
Ori