Re: [Wikitech-l] Parser practicum

15 Nov 2007

On 11/15/07, Jay R. Ashworth &lt;jra(a)baylink.com&gt; wrote:
...

  L'''idée'' <- apostrophe
followed by italics
 L''''idée''' <- apostrophe followed by bold

 That's a *requirement* to continue to properly handle French and Italian
 text. The current apostrophe pass handler uses I believe a lookahead and
 then goes backwards, which is a fairly sane way of doing this. If EBNF
 can't handle it, then forget EBNF. 
 Can someone tell me why bold and italics are considered *part of the
 spelling of the word* (which seems to be what you're implying here)?

 I've never seen that to be the case in any character-based natural
 language.

 I think it's more that L''''idee'' is commonly used
idiom. It's not part of
the "spelling of the word", whatever that means.

I wonder whether it's possible to handle some of these idioms directly:

Foo'''bar: definitely a bold-toggle.
A'''bar: definitely an apostrophe followed by an italic-toggle.

That means that  [ '''hello a'''bar ] will render as
<b>hello
a'<i>bar</i></b> which is surprising, but if no one currently uses
that
construct, maybe we can get away with it.

Similarly, it might be worth investigating exactly what mid-word
multi-apostrophic constructs are used (yes, Jay, like you suggested...). In
French, d'* and l'* are used, and I guess an arbitrary number of others with
diminishing likelihood: qu'*, jusqu'*, s'*, and even m'*, t'*, etc.

I hate the parser's (doQuotes()) current approach of trying to second-guess
what the user wants: we should be dictating the grammar, and either they are
using a rule we specify, or they aren't. I don't really care how complicated
the rules get, but we should be able to define them, stick them on a wall,
and tell people: if you're not using one of these rules, you're going to get
garbage.

Anyway. If we follow the approach I mentioned earlier, then all we have to
do is parse apostrophe clumps as a single unit, then sort them out in a
second step. Hopefully we can call that function something cute like
resolveApostrophalypticChaos() or something. It doesn't really have much
impact on the grammar apart from that, so we've probably discussed it
enough.

Steve

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Parser practicum