Re: [Wikitech-l] Dajoo: a Java-based offline editor/viewer

17 Aug 2006


      On 8/17/06, Eric Astor eastor1@swarthmore.edu wrote:
...
Single case that shows something interesting:
'''hi''hello'''hi'''hello''hi'''
Try running it through MediaWiki, and what do you get?
<b>hi<i>hello</i></b><i>hi<b>hello</b></i><b>hi</b>
That's awesome :)
...
In other words, you've discovered that the current syntax supports improper
nesting of markup, in a rather unique fashion. I don't know of any way to
duplicate this in any significantly formal system, although I believe a
multiple-pass parser *might* be capable of handling it. In fact, some sort
of multiple-pass parser (the MediaWiki parser) obviously can.
Is this not the sort of "backwards compatibility" that we could safely
do without? Does anyone intentionally use that kind of construct?
...
Also, templates need to be transcluded before most of the parsing can take
place, since in the current system, the text may leave some
syntactically-significant constructs incomplete, finishing them in the
transclusion stage...
That's sort of a given, isn't it? What's the downside of doing
transclusion first?
...
if it had been properly escaped). This even holds true for bold and italics,
since you need indefinite lookahead to be able to tell whether the first
three quotes in '''this'' should be parsed as ''', <i>', or <b>. The
situation gets even worse when you try to allow for improper nesting.
Personally I find the rules for multiple apostrophes very strange and
unpredictable - and hence worth changing. I was really surprised when
I sat down one to day test what happens when you stack one, two,
three...ten apostrophes. Not what I expected at all. No takers to
replace ''' with // or something?
...
Other places require fixed, but large, amounts of lookahead... freelinks
require at least 9 characters, for example. Technically, I'll admit that a
What's a freelink?
...
GLR parser (or a backtracking framework) could manage even the indefinite
lookahead that I mentioned... but it's still problematic, since the grammar
is left ambiguous in certain cases.
Oh, right - and we'd need to special-case every tag-style piece of markup,
including every allowed HTML tag, since formal grammars generally can't
reference previously-matched text. This also applies to the heading levels -
we'd need separate ad-hoc constructs for each level of heading we wanted to
support, duplicating a lot of the grammar between each one.
I don't understand, can you give an example?
...
P.S. As indicated above, I honestly feel that the difficulties aren't
insurmountable - if you're willing to build an appropriate parsing
framework, which will be semi-formal at best.
What would such a thing look like, formal BNE rules mixed in with text
like "Actually if FOO is "boo" then special case Z is invoked..."?
Steve

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Dajoo: a Java-based offline editor/viewer