On 11/9/07, Simetrical Simetrical+wikilist@gmail.com wrote:
According to flex documentation, it's perfectly happy to accept any regex for tokens, and will use unlimited lookahead and backtracking if necessary. It provides debug info allowing you to check for and eliminate backtracking, if you want to speed it up, but that's optional. Clearly it's not possible to tokenize MW markup with one-character lookahead: you sure can't tell the difference between a second- and sixth-level heading, and of course that's even ignoring
Yes you can, if ====== is a token. Which at first glance, it should be. The fact that == looks like === looks like ==== is neither here nor there to the grammar - it's a handy mnemonic for humans, that's all.
stuff like ISBN handling that's less basic and more disposable.
What's wrong with ISBN handling? I don't see anything problematic in an "ISBN" token that consumes a following sequence of digits, possibly with hyphens and crap.
Is there a definitive list of the real problems with the current "grammar"? We've mentioned two so far: bold/italic apostrophes, and nested lists. Imagine much of the ambiguities relating to | in templates, parser functions, tables and the like would be one. What else is there?
Steve