I originally wrote the EBNF situated at Meta for personal amusement - I had a vague idea that it might be useful to someone, but like I say I'm no computer scientist. I do believe an EBNF expression is impossible, but this appears to not be a problem considering our actual goals of such an expression.
I'm assuming our problem is this: currently we "parse" wikitext by immediately converting, via regex, into XHTML. This is not really "parsing", because parsing usually means the creation of an abstract Document Object Model which is then iterated through to generate XHTML, XML, FooBar or whatever (or so I have learnt). Because we're missing this DOM, Wikitext can't expand beyond being used by the current parser (so we can't do WYSIWYG, etc.). However, there appears to be no way of creating a DOM from Wikitext because this would be to standardise the way syntax converts to output, but any kind of standardisation will cause backwards incompatibility.
So our problem is the dilemma: either we standardise, and lose backwards compatibility, or we don't, and lose extensibility. And in the long run, I think the first option is better. However, in standardising the language we'd lose the feature of it that all syntax is valid (useful, as then people can't ever be presented with error messages on their pages) so ideally the move toward standardisation would have to be accompanied by a switch to WYSIWYG editing, such that the code becomes beyond reach and is forced to be always valid.
On the point of immutable validity, it is perhaps less useful for all text to be valid than for there to be "invalid markup" error messages. Whilst the former ensures users can never really "go wrong", it is still true that bad markup will lead to results they very much didn't intend - and it seems to me more useful to give them an error message than a wildly unintended result.
On 13/11/2007, Steve Bennett stevagewp@gmail.com wrote:
On 11/14/07, David Gerard dgerard@gmail.com wrote:
- is completely inappropriate in a discussion of 1. And that it's
important doesn't change that. A redesigned parser must present an almost-identical interface to the present implementation to get in; this is NOT the place to argue for syntax changes, and any attempts to change the syntax will in fact doom the effort.
The except to that is where crazy, unuseful syntax is actually a hindrance to the definition of an EBNF grammar and its implementation, as we've discussed earlier.
I have to say, I'm finding some really whacky things that work. Try this code:
o
Or how about an [[Image:foo.jpg|With an [[Image:foo.jpg]] in its caption...]] ?
Or even one with the table of contents: [[Image:foo.jpg|__TOC__]]
How do you think this <pretty> little piece of text renders?
Incidentally, I'm making good progress on the grammar. I've merged in most of what was at meta, so at least there is only *one* grammar now (though part of it is EBNF and the rest is BNF). http://www.mediawiki.org/wiki/Markup_spec/BNF/Article
A recurring question is who is actually going to write this parser though. Parser.php is 5000 lines and sanitizer.php another 1300. And probably other files I don't even know about. We're talking about months of work in the dark, with no feedback, and no guarantee that it will even get used. We're going to have to come up with a better coding process than "you write the code and when you're done we'll look at it".
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l