On 11/15/07, Jay R. Ashworth <jra(a)baylink.com> wrote:
L'''idée'' <- apostrophe
followed by italics
L''''idée''' <- apostrophe followed by bold
That's a *requirement* to continue to properly handle French and Italian
text. The current apostrophe pass handler uses I believe a lookahead and
then goes backwards, which is a fairly sane way of doing this. If EBNF
can't handle it, then forget EBNF.
Can someone tell me why bold and italics are considered *part of the
spelling of the word* (which seems to be what you're implying here)?
I've never seen that to be the case in any character-based natural
language.
I think it's more that L''''idee'' is commonly used
idiom. It's not part of
the "spelling of the word", whatever that means.
I wonder whether it's possible to handle some of these idioms directly:
Foo'''bar: definitely a bold-toggle.
A'''bar: definitely an apostrophe followed by an italic-toggle.
That means that [ '''hello a'''bar ] will render as
<b>hello
a'<i>bar</i></b> which is surprising, but if no one currently uses
that
construct, maybe we can get away with it.
Similarly, it might be worth investigating exactly what mid-word
multi-apostrophic constructs are used (yes, Jay, like you suggested...). In
French, d'* and l'* are used, and I guess an arbitrary number of others with
diminishing likelihood: qu'*, jusqu'*, s'*, and even m'*, t'*, etc.
I hate the parser's (doQuotes()) current approach of trying to second-guess
what the user wants: we should be dictating the grammar, and either they are
using a rule we specify, or they aren't. I don't really care how complicated
the rules get, but we should be able to define them, stick them on a wall,
and tell people: if you're not using one of these rules, you're going to get
garbage.
Anyway. If we follow the approach I mentioned earlier, then all we have to
do is parse apostrophe clumps as a single unit, then sort them out in a
second step. Hopefully we can call that function something cute like
resolveApostrophalypticChaos() or something. It doesn't really have much
impact on the grammar apart from that, so we've probably discussed it
enough.
Steve