On 11/8/07, Nick Jenkins nickpj@gmail.com wrote:
My understanding (which could of course be wrong) is this:
- It is still a goal (at the moment the only complete wiki text
specification is the code that implements the parser).
- Progress towards this goal is stalled.
- The reason it is stalled is because it may not be possible (I think we
have a few 90% implementations, where people realised the last 10% could not be done using the approach used in the other 90%).
What exactly is the "goal"? If it's just "formally defining whatever it is that the code currently does", is that a worthy goal? I ask because obviously the parser currently does some crappy things, and it does some of them in a crappy way. Is it really worth carefully defining those crappy things just so that it can do them in a better way? Could we not just define the 90% of what it currently does that we actually *want* it to do, then write a parser that does that, and accept the fact that some wikitext *will* be broken?
For example, I just opened a bug on the strange, unpredictable, and not very useful behaviour that arises from constructs such as:
;#foo:blaa
And the fact that this:
;foo:#blaa
is distinct from this:
;foo :#blaa
This is all weird, very complex behaviour that arises from the nature of the parser. But it's not useful. And it's not really worth spending endless hours attempting to understand. Could we not simply redefine the grammar, and chop out the ";term:definition" syntax altogether? Yes, it would break some code but it would be easy to identify those bits and replace them with ";term<newline>:definition".
If the goal is to improve the grammar and then write a good parser, why expend so much effort attempting to document to the nth degree the virtually incomprehensible behaviour currently exhibited by the parser?
Steve PS Lest I be misunderstood, I'm not criticising the parser or its authors: the parser code actually looks well written and extremely readable. It's just in the nature of multi-pass pattern matching/replacement that extraordinarily complex behaviour arises. And then you run the output through tidy...