On 9/28/2010 3:53 AM, Andreas Jonsson wrote:
For my own IX work I've written a wikimedia markup parser in C#
based on the Irony framework. It fails to parse about 0.5% of pages in wikipedia
What do you mean with "fail". It assigns slightly incorrect semantic to a construction? It fails to accept the input? It crashes?
Fails to accept input -- that is, the text doesn't match the grammar.
Now, the toolchain above the parser gets between 30-80% recall at the moment doing the things it has to do, so making the grammar better isn't the highest priority on my list.