I've just read the past couple of days of discussion, and would like to agree with Merlijn.
One of the points missed is that the pipe trick and many of the other "end cases" are actually pre-processed, not stored in the database.
The easy examples being: * [[turkey (bird)|]] is stored as [[turkey (bird)|turkey]] * [[stuff]]ing is stored as [[stuff|stuffing]]
Other such behaviors could be regularized, and not affect the existing articles. Some years back, I made some suggestions in this wise, but they were not accepted.
A case I was concerned with at the time was normalized pre-processing of [[stuff:]] versus [[:stuff]], and [[|stuff]] versus [[stuff|]], and their combinations -- [[:stuff (action)|]]. This is the kind of thing that could most easily be formalized.
In regularizing the grammar, think about how the back-end data could be normalized to a new grammar for editing, and then stored again in the back-end form. For example, the // and ** ideas we've talked about multiple times over the years. No reason that the database couldn't continue to store them as '' and '''. Or better as <i> and <b>!
If we stick to just front-end parsing, the project might be doable in our lifetimes.
===
And as a final note for the computer scientists, remember that we often use LR(1) and LALR(1) grammars, but RL(1) is also possible! MW syntax has often seemed to me more like RL....
(Yes, back in university we were all required to write a parser -- a year-long project. I've written several for later projects, too. But university was a very long time ago.)
wikitech-l@lists.wikimedia.org