On 8/17/06, Eric Astor eastor1@swarthmore.edu wrote:
Single case that shows something interesting: '''hi''hello'''hi'''hello''hi'''
Try running it through MediaWiki, and what do you get? <b>hi<i>hello</i></b><i>hi<b>hello</b></i><b>hi</b>
That's awesome :)
In other words, you've discovered that the current syntax supports improper nesting of markup, in a rather unique fashion. I don't know of any way to duplicate this in any significantly formal system, although I believe a multiple-pass parser *might* be capable of handling it. In fact, some sort of multiple-pass parser (the MediaWiki parser) obviously can.
Is this not the sort of "backwards compatibility" that we could safely do without? Does anyone intentionally use that kind of construct?
Also, templates need to be transcluded before most of the parsing can take place, since in the current system, the text may leave some syntactically-significant constructs incomplete, finishing them in the transclusion stage...
That's sort of a given, isn't it? What's the downside of doing transclusion first?
if it had been properly escaped). This even holds true for bold and italics, since you need indefinite lookahead to be able to tell whether the first three quotes in '''this'' should be parsed as ''', <i>', or <b>. The situation gets even worse when you try to allow for improper nesting.
Personally I find the rules for multiple apostrophes very strange and unpredictable - and hence worth changing. I was really surprised when I sat down one to day test what happens when you stack one, two, three...ten apostrophes. Not what I expected at all. No takers to replace ''' with // or something?
Other places require fixed, but large, amounts of lookahead... freelinks require at least 9 characters, for example. Technically, I'll admit that a
What's a freelink?
GLR parser (or a backtracking framework) could manage even the indefinite lookahead that I mentioned... but it's still problematic, since the grammar is left ambiguous in certain cases.
Oh, right - and we'd need to special-case every tag-style piece of markup, including every allowed HTML tag, since formal grammars generally can't reference previously-matched text. This also applies to the heading levels - we'd need separate ad-hoc constructs for each level of heading we wanted to support, duplicating a lot of the grammar between each one.
I don't understand, can you give an example?
P.S. As indicated above, I honestly feel that the difficulties aren't insurmountable - if you're willing to build an appropriate parsing framework, which will be semi-formal at best.
What would such a thing look like, formal BNE rules mixed in with text like "Actually if FOO is "boo" then special case Z is invoked..."?
Steve