On 11/15/07, Jay R. Ashworth jra@baylink.com wrote:
L'''idée'' <- apostrophe followed by italics L''''idée''' <- apostrophe followed by bold
That's a *requirement* to continue to properly handle French and Italian text. The current apostrophe pass handler uses I believe a lookahead and then goes backwards, which is a fairly sane way of doing this. If EBNF can't handle it, then forget EBNF.
Can someone tell me why bold and italics are considered *part of the spelling of the word* (which seems to be what you're implying here)?
I've never seen that to be the case in any character-based natural language.
I think it's more that L''''idee'' is commonly used idiom. It's not part of the "spelling of the word", whatever that means.
I wonder whether it's possible to handle some of these idioms directly:
Foo'''bar: definitely a bold-toggle. A'''bar: definitely an apostrophe followed by an italic-toggle.
That means that [ '''hello a'''bar ] will render as <b>hello a'<i>bar</i></b> which is surprising, but if no one currently uses that construct, maybe we can get away with it.
Similarly, it might be worth investigating exactly what mid-word multi-apostrophic constructs are used (yes, Jay, like you suggested...). In French, d'* and l'* are used, and I guess an arbitrary number of others with diminishing likelihood: qu'*, jusqu'*, s'*, and even m'*, t'*, etc.
I hate the parser's (doQuotes()) current approach of trying to second-guess what the user wants: we should be dictating the grammar, and either they are using a rule we specify, or they aren't. I don't really care how complicated the rules get, but we should be able to define them, stick them on a wall, and tell people: if you're not using one of these rules, you're going to get garbage.
Anyway. If we follow the approach I mentioned earlier, then all we have to do is parse apostrophe clumps as a single unit, then sort them out in a second step. Hopefully we can call that function something cute like resolveApostrophalypticChaos() or something. It doesn't really have much impact on the grammar apart from that, so we've probably discussed it enough.
Steve