2010-09-27 22:46, Paul Houle skrev:
On 9/27/2010 2:58 PM, Chad wrote:
This. Tim sums up the consensus very well with that commit summary. He also made some comments on the history of wikitext and alternative parsers on foundation-l back in Jan '09[0]. Worth a read (starting mainly at ""Parser" is a convenient and short name for it").
While a real parser is a nice pipe dream, in practice not a single project to "rewrite the parser" has succeeded in the years of people trying. Like Aryeh says, if you can pull it off and make it practical, hats off to you.
For my own IX work I've written a wikimedia markup parser in C#
based on the Irony framework. It fails to parse about 0.5% of pages in wikipedia
What do you mean with "fail". It assigns slightly incorrect semantic to a construction? It fails to accept the input? It crashes?
and is oblivious to a lot of the stranger stuff [like the HTML intrusions] but it does a good job of eating infoboxes and making sense of internal and external links. Now, the strange stuff + the parse fails would probably be impossible to handle in a rational way...
I disagree. I believe that there is a rational way to handle all kinds of input.
/Andreas