On Thu, Nov 15, 2007 at 03:56:21PM +1100, Steve Bennett wrote:
On 11/15/07, Jay R. Ashworth jra@baylink.com wrote:
L'''idée'' <- apostrophe followed by italics L''''idée''' <- apostrophe followed by bold
That's a *requirement* to continue to properly handle French and Italian text. The current apostrophe pass handler uses I believe a lookahead and then goes backwards, which is a fairly sane way of doing this. If EBNF can't handle it, then forget EBNF.
Can someone tell me why bold and italics are considered *part of the spelling of the word* (which seems to be what you're implying here)?
I've never seen that to be the case in any character-based natural language.
I think it's more that L''''idee'' is commonly used idiom. It's not part of the "spelling of the word", whatever that means.
If it's a *requirement* that we be able to produce a certain text rendering of a word, then it is no longer merely a rendering, it's part of the spelling of the word -- sometihing without which it's not the same word.
Similarly, it might be worth investigating exactly what mid-word multi-apostrophic constructs are used (yes, Jay, like you suggested...). In French, d'* and l'* are used, and I guess an arbitrary number of others with diminishing likelihood: qu'*, jusqu'*, s'*, and even m'*, t'*, etc.
I hate the parser's (doQuotes()) current approach of trying to second-guess what the user wants: we should be dictating the grammar, and either they are using a rule we specify, or they aren't. I don't really care how complicated the rules get, but we should be able to define them, stick them on a wall, and tell people: if you're not using one of these rules, you're going to get garbage.
Well, it will be interesting to see how that plays in Peoria, yes. :-)
Cheers, -- jra