-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Eric Astor Sent: 12 February 2007 03:03 To: Wikimedia developers Subject: Re: [Wikitech-l] WYSIWYG (or WYSIWYM or WYSIWYM) status?
Nick Jenkins wrote:
As one of the many people who's done so, I agree. :) The
problem is
that ~80% of wikimarkup is pretty straightforward to parse using standard methods, another 10-15% can be done without huge
difficulty
using known-but-less-standard methods, and the remaining
5% doesn't fit
well at all into any of the normal models of lexing/parsing.
[...snip...]
-Mark
Can I maybe suggest please giving some examples that you
encountered of
the 10-15% hard category, and the 5% very hard category?
I ask so that if anyone feels tempted to start on defining
the behaviour,
we can gently suggest doing the harder stuff *first* (with
examples),
thus hopefully preventing the situation where we have
multiple unfinished
80%-done definitions, and no 100%-complete formal definitions.
All the best, Nick.
Just one example - probably of the 5% very hard category:
'''''hello''' hi'' vs. '''''hi'' hello'''
Rendered in HTML, the first reads <i><b>hello</b> hi</i>, and the second reads <b><i>hi</i> hello</b>. The problem is that the meaning of the first 5 quotes changes based on the order in which the bold and italic regions close - which is not determined while scanning left-to-right.
Another example:
'''hello ''hi''' there''
MediaWiki renders this as <b>hello <i>hi</i></b><i> there</i>, properly handling overlapping formatting.
There are ways to deal with these... putting off the resolution until a later pass is the only way I know of that deals with the first one, and it's a bit touchy. Manageable, but touchy.
Think the easiest method (and nearer to be able to keep as it a single pass) is to use DOM. Guarentees valid XML output always, which I believe the MediaWiki parser doesn't always do.
Also can easly going back and fixing up the DOM tree, if the parser has made an initial wrong choice. Like
'''italics''
It might start out as <b>italics</b>, but seeing '' its can be corrected to '<i>italics</i>.
Jared