On 11/13/07, Steve Bennett <stevagewp(a)gmail.com>
wrote:
What's the best way to approach parsing a
long string of formatted text:
1) Treat each incidence of ''' or '' as an element to be translated
into
<b>, <i>, </b>, or </i>, using state ("context"?) to
determine which
2) Have a rule that treats an entire run of '''........''' as a
single
element, to be transformed into <b>.......</b>.
To answer my own question, I don't think 2) is possible, due to the
legitimacy of constructs like:
Here is some ''italics with a [[link|that switches ''off]] the italics.
That's something of a scary-looking case we might label pathological,
but we might well see something like this:
''See also [[HMS Pinafore|the operetta ''HMS Pinafore'']] for some
stuff.''
where we want to toggle italic state _entirely_ within link text. Even
if we only handle start-end pairs on the same level of the parse tree,
it's necessary to keep track of the parent's state to know how to handle
it in the child.
An acceptable rendering might be:
<p><i>See also </i><a><i>the operetta </i>HMS
Pinafore<i></i><a><i> for
some stuff.</i></p>
(That empty <i></i> at the end of the link can then be elided.)
Alternatively we can go totally crazy and use CSS to create some kind of
anti-italic element.... >:D
<p><i>See also <a>the operetta <i class="plain">HMS
Pinafore</i><a> for
some stuff.</i></p>
But that way may lead madness.
-- brion vibber (brion @