On 11/9/07, Simetrical Simetrical+wikilist@gmail.com wrote:
I suspect a major problem might arise if there are constructs that require more than one-token lookahead. There probably are, and apparently bison et al. can't parse those. But again, I would defer
I think it would be a good idea to formalise and improve the grammar so that wasn't the case. Does any sane grammar need more than one token look ahead?
# If there is an odd number of both bold and italics, it is likely
# that one of the bold ones was meant to be an apostrophe followed # by italics. Which one we cannot know for certain, but it is more # likely to be one that has a single-letter word before it.
This is a good example. There is no grammar, therefore no spec, therefore the parser can do whatever it wants. However it tries to guess. No one has ever really defined the answer to the question: What is represented by the following string: '''''
There are many answers to the question, depending on the context. It's horrible. It shouldn't be like that. There are solutions:
- Distinct sequences for italics and bold (**this** being the obvious choice for bold) - Specific tokens for bold, italics, and bold-italics, so that this: '''''Some''' word'' is no longer valid. Instead you would write '''''Some''''' ''word''. - Strong escaping mechanisms such that the parser deliberately gives up very early on, and if you want bold-italic apostrophes, you're going to have to escape them. Making ''''''foo'''''' deliberately render as bold-italics 'foo' is madness*. Cute for a lolcode or Intercal, but for MediaWiki?
Steve * Well, it would be logical madness if it actually rendered like that. For some reason the first apostrophe renders as neither bold nor italics. So it's illogical madness :)