On 8/17/06, Steve Bennett stevage@gmail.com wrote:
Is this not the sort of "backwards compatibility" that we could safely do without? Does anyone intentionally use that kind of construct?
Maybe, maybe not, but people expect that if there's a ''' open, then ''' will close it. They won't expect an intervening '' (or [[ or ] or anything else) to affect matters. Besides, what would you suggest they type, '''hi''''''''hello'''hi'''hello''''''''hi'''? God only knows what that would do.
That's sort of a given, isn't it? What's the downside of doing transclusion first?
Well, I don't think it's so much a downside as something that's impossible to work into a formal grammar. I'm guessing the issue is that templates mix syntax with semantics: the semantics of the template influence the output of the parse tree. So a pass to replace all the templates has to be done before you can even start talking about a formal grammar. But it's a given, yes, as you say.
What's a freelink?
A URL-like thing that was typed without any particular surrounding syntax (it gets autolinked). Similar lookahead would presumably be necessary for RFCs, ISBNs, and PMIDs (okay, that's enough to convince me to agree that they should be ditched :) ). In general, a lookahead of no more than one character is considered desirable.
Oh, right - and we'd need to special-case every tag-style piece of markup, including every allowed HTML tag, since formal grammars generally can't reference previously-matched text. This also applies to the heading levels - we'd need separate ad-hoc constructs for each level of heading we wanted to support, duplicating a lot of the grammar between each one.
I don't understand, can you give an example?
He *seems* to be saying that you'd have to make special rules for each allowed HTML tag, and presumably each allowed attribute and property thereof, and maybe even every combination of them (!). Would there be any advantage in leaving those out of the grammar and keeping Parser and Sanitizer separate as they are now?