On 11/14/07, David Gerard dgerard@gmail.com wrote:
We can iteratively approach it. "Here's 99%. What's the breakage level?" "This, this and this need to work." *fixfixfix*
Yep. Except that the 99% we've been bandying about is effectively the "non-breakage" level - the proportion of corpus pages that render correctly. As opposed to the portion of the imaginary grammar we think we've implemented.
Well, I'm chugging along here on the grammar, feeling pretty optimistic. The current parser can certainly do crazy things, but if we accept that those crazy things are not for the most part useful, and that no one will mind if they go away, the task becomes easier.
It's interesting to note the language features that were actually difficult to implement in the pattern-transform parser.
For example, __TOC__ is not particularly straightforward: 1. search for the first occurrence 2. replace it with a special token 3. remove all other occurrences ... 4. find the special token and replace it with the actual contents.
Dealing with <nowiki> is even more tortured.
However, in a consume-parse-render parser, it's simpler: 1. Find the token just like any other token 2. If the "magicword-TOC-found" flag is already set, skip further processing. 3. Set the magicword-TOC-found flag. 4. Store the token as normal.
Then at render time: 1. When you hit the token, write out the contents. Presumably you already have enough information to do this.
So not all of the parser conversion is bad news.
Steve