"Jay Ashworth" jra@baylink.com wrote in message news:32162150.4910.1294292017738.JavaMail.root@benjamin.baylink.com...
----- Original Message -----
The thing you want expanded, George, is "Last Five Percent"; I refer there to (I think it was) David Gerard's comment earlier that the first 95% of wikisyntax fits reasonably well into current parser building frameworks, and the last 5% causes well adjusted programmers to consider heroin... or something like that. :-)
The argument advanced was always "there's too much usage of that ugly stuff to consider Just Not Supporting It" and I always asked whether anyone with larger computers than me had ever extracted actual statistics, and no one ever answered.
This is a key point. Every other parser discussion has floundered *before* the stage of saying "here is a working parser which does *something* interesting, now we can see how it behaves". Everyone before has got to that last 5% and said "I can't make this work; I can do *this* which is kinda similar, but when you combine it with *this* and *that* and *the other* we're now in a totally different set of edge cases". And stopped there. Obviously it's impossible to quantify all the edge cases of the current parser *because of* the lack of a schema, but until we actually get a new parser churning through real wikitext, we're blind in the dark to say whether those edge cases make up 5%, 0.5% or 50% of the corpus that's out there.
--HM