On Wed, Nov 14, 2007 at 12:27:51AM +1100, Steve Bennett wrote:
On 11/13/07, Steve Bennett <stevagewp(a)gmail.com>
wrote:
What's the best way to approach parsing a
long string of formatted text:
1) Treat each incidence of ''' or '' as an element to be translated
into
<b>, <i>, </b>, or </i>, using state ("context"?) to
determine which
2) Have a rule that treats an entire run of '''........''' as a
single
element, to be transformed into <b>.......</b>.
To answer my own question, I don't think 2) is possible, due to the
legitimacy of constructs like:
Here is some ''italics with a [[link|that switches ''off]] the italics.
I think '' and ''' will have to be parsed as rather ambiguous
"toggle state
of bold/italics" tokens, whose meaning can be made more clear by walking the
AST afterwards.
It's a pity, because the existing work on the EBNF assumed that they could
be treated as blocks.
http://www.mediawiki.org/wiki/Markup_spec (was at
meta)
Unless someone wants to jump in and claim that the above construct is a
mistake and that ''..'' *should* be a block of some kind.
Right here, Steve, you're hitting on the underlying problem with this
project: some behavior of the current parser is defined and
intentional, and some of it is probably an accident of the
implementation.
Distinguishing these is probably a) important and b) impossible.
Cheers,
-- jra
--
Jay R. Ashworth Baylink jra(a)baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates
http://baylink.pitas.com '87 e24
St Petersburg FL USA
http://photo.imageinc.us +1 727 647 1274