On Fri, Sep 25, 2009 at 8:38 AM, Happy-melon happy-melon@live.com wrote:
The 10% drove people off cliffs because it is, pretty much by definition, the horrible unexpected behaviour that is a *consequence* of not having a formal definition. Writing a formal definition is not impossible if you require that it be sensible at the final reading. The parser is, in many places, *not* sensible, and naturally those quirks are difficult to describe, but they're also undesirable overall. A true move to a formal language definition involves action from both ends: writing a formal definition that follows the current parser in general, *and* being prepared to alter the parser to remove some of the more egregious deviations from expected behaviour.
I just wanted to state for the record that when we were talking about this last time, the developers (Brion included) were actually quite open to the idea of the semantics of wikitext changing if they weren't widely used. In other words, it was ok to build a new parser which was incompatible with the old parser, as long as that didn't break too much existing wikitext ("too much" being in the order of 1 or 2% of articles).
Another comment:
The problem is the ambiguity with italics, (''italics''). So the current parser doesn't really make its final decision on what should be bold or what should be italic until it hits a newline. If there are an even number of both bold and italics then it assumes it interpreted the line correctly.
...
I think this is part of what makes wikitext undescribable in a formal grammar.
Yeah, but from memory, using ANTLR's formal-grammar-breaking features, this wasn't a massive problem. A small, annoying one, to be sure, but not a killer. It does tend to mean potentially a lot of back-tracking though, which is slow...
Steve
wikitech-l@lists.wikimedia.org