I've just read the past couple of days of discussion, and would like to
agree with Merlijn.
One of the points missed is that the pipe trick and many of the other
"end cases" are actually pre-processed, not stored in the database.
The easy examples being:
* [[turkey (bird)|]] is stored as [[turkey (bird)|turkey]]
* [[stuff]]ing is stored as [[stuff|stuffing]]
Other such behaviors could be regularized, and not affect the existing
articles. Some years back, I made some suggestions in this wise, but
they were not accepted.
A case I was concerned with at the time was normalized pre-processing
of [[stuff:]] versus [[:stuff]], and [[|stuff]] versus [[stuff|]],
and their combinations -- [[:stuff (action)|]]. This is the kind of
thing that could most easily be formalized.
In regularizing the grammar, think about how the back-end data could be
normalized to a new grammar for editing, and then stored again in the
back-end form. For example, the // and ** ideas we've talked about
multiple times over the years. No reason that the database couldn't
continue to store them as '' and '''. Or better as <i> and <b>!
If we stick to just front-end parsing, the project might be doable in
our lifetimes.
===
And as a final note for the computer scientists, remember that we often
use LR(1) and LALR(1) grammars, but RL(1) is also possible! MW syntax
has often seemed to me more like RL....
(Yes, back in university we were all required to write a parser -- a
year-long project. I've written several for later projects, too.
But university was a very long time ago.)