On 2/11/07, Eric Astor <eastor1(a)swarthmore.edu> wrote:
Just one example - probably of the 5% very hard
category:
'''''hello''' hi''
vs.
'''''hi'' hello'''
Rendered in HTML, the first reads <i><b>hello</b> hi</i>, and the
second
reads <b><i>hi</i> hello</b>. The problem is that the meaning of
the
first 5 quotes changes based on the order in which the bold and italic
regions close - which is not determined while scanning left-to-right.
This is where we could redefine the behavior slightly. Have '''''
always be <b><i>. Then, if ''' occurs first, output
</i></b><i>. On
the other hand, from what you say next, I'm not sure that will help.
Another example:
'''hello ''hi''' there''
MediaWiki renders this as <b>hello <i>hi</i></b><i>
there</i>, properly
handling overlapping formatting.
There are ways to deal with these... putting off the resolution until a
later pass is the only way I know of that deals with the first one, and
it's a bit touchy. Manageable, but touchy.
Well, we could just output invalid XML for both these cases and then
fix it in the Sanitizer/Tidy pass, I guess. In some clearly defined
manner, of course, perhaps stated informally in the grammar, or
formally as a separate non-parsing algorithm.