2010-08-10 12:45, Thomas Dalton skrev:
On 10 August 2010 11:09, David Gerarddgerard@gmail.com wrote:
On 9 August 2010 17:04, Mark Clements (HappyDog)gmane@kennel17.co.uk wrote:
This kind of unexpected edge-case is arguably something that should be fixed in any formal markup specification.
How prevalent is it in actual wikitext? Is it an edge case people actually use much, or are all instances of it basically errors? That'll be the question.
Its only potential use is in making the wikitext more easily readable, which doesn't seem important enough to warrant just a weird edge-case. Any formal spec is going to end up breaking things, that can't really be helped (unless we just write down a spec for the current behaviour, bugs and all, which sounds like a lost opportunity to me).
If you consider the large body of information tied to MediaWiki syntax, it is likely that for any border case, there is a revision of some page that will trigger that border case.
Regarding strategy on how to replace the MediaWiki parser, I can see two extremes:
1. Search out all wierd edge cases and reproduce them in parser rules. Walk through the revisions of Wikipedia and for each edge case, note all revisions for which the parser rule for the edge case is executed. Based on the data determine which edge cases can be safely removed. Or define a conversion for the content.
2. Don't support any edge cases. Just consider the content broken and let the wiki users themself fix it. Historic revisions of pages will be permanently broken.
I am trying to support as many edge cases as far as reasonable in my attemt to write a new parser. I seems, however, as if the parser is actively developed, and backwards compliancy with edge cases maybe isn't much of a concern. For instance, in 1.16.0beta3 we have:
$text = $this->doAllQuotes( $text ); $text = $this->replaceInternalLinks( $text ); $text = $this->replaceExternalLinks( $text );
which in trunk is:
$text = $this->replaceInternalLinks( $text ); $text = $this->doAllQuotes( $text ); $text = $this->replaceExternalLinks( $text );
So, it is now possible to have apostrophes in internal links, but still not in external.
From the parser's point of view, the edge cases can be divided into "harmless", where a rule to support it does not increase the complexity of the parser significantly, and "harmful", where adding a rule to support them would either dramatically increase the size of the parser or make it possible to craft contents that will take more than linear time or memory to process. The edge cases surrounding links definitely fall into the harmful category. I will be writing a separate post about links later.
Maybe it would be a good idea to provide som feedback to the user regarding bad syntax. In my parser implementation, I am considering generating special events for syntax that should be avoided. For instance:
begin_table: begin = BEGIN_TABLE NEWLINE* ( { X->beginGarbageBlock(X, "Unsupported syntax: content between the {| and the first column in a table."); } ((inline_element)=> garbage_inline_text NEWLINE* )* block_elements? { X->endGarbageBlock(X); } )* { X->beginTable(X, $begin->custom); } ;
Could for instance be rendered in html as: <div class="garbage" title="Unsupported syntax: content between ..."> </div>.
/Andreas