On 8/17/06, Simetrical Simetrical+wikitech@gmail.com wrote:
On 8/17/06, Steve Bennett stevage@gmail.com wrote:
Is this not the sort of "backwards compatibility" that we could safely do without? Does anyone intentionally use that kind of construct?
Maybe, maybe not, but people expect that if there's a ''' open, then ''' will close it. They won't expect an intervening '' (or [[ or ] or anything else) to affect matters. Besides, what would you suggest they type, '''hi''''''''hello'''hi'''hello''''''''hi'''? God only knows what that would do.
I don't think anyone would feel confident predicting what happens in any of these cases. Mostly it comes down to "try it and see".
In the case of '''fooo''boooo'''.... well clearly something has gone wrong somewhere. However we choose to interpret after that should be undefined.
Seriously though, whoever came up with ''..'' for italics and '''...''' for bold was, um, making life difficult!
Well, I don't think it's so much a downside as something that's impossible to work into a formal grammar. I'm guessing the issue is
Is it? Can't the formal grammar simply apply *after* all transclusions? Like in C you have the "preprocessor grammar" and then the actual grammar of the rest of the language (or am I dreaming)?
A URL-like thing that was typed without any particular surrounding syntax (it gets autolinked). Similar lookahead would presumably be necessary for RFCs, ISBNs, and PMIDs (okay, that's enough to convince me to agree that they should be ditched :) ). In general, a lookahead of no more than one character is considered desirable.
What can I say, I don't like these "freelinks". They just don't seem clean. Normal text which spontaneously turns into a link without any special punctuation or anything. Hmm.
He *seems* to be saying that you'd have to make special rules for each allowed HTML tag, and presumably each allowed attribute and property thereof, and maybe even every combination of them (!). Would there be any advantage in leaving those out of the grammar and keeping Parser and Sanitizer separate as they are now?
I don't get why we even allow HTML tags, other than convenience. It's not like the final output of the encyclopaedia is guaranteed to bear any resemblance to a web page...
For instance, why do we support <b>? We have '''... It's just not clean. (I dare someone to reply that ''' is semantic markup...heh.)
Steve