On Thu, Aug 17, 2006 at 05:12:22PM +0200, Steve Bennett wrote:
A URL-like thing that was typed without any particular surrounding syntax (it gets autolinked). Similar lookahead would presumably be necessary for RFCs, ISBNs, and PMIDs (okay, that's enough to convince me to agree that they should be ditched :) ). In general, a lookahead of no more than one character is considered desirable.
What can I say, I don't like these "freelinks". They just don't seem clean. Normal text which spontaneously turns into a link without any special punctuation or anything. Hmm.
Parsers don't have to be single pass... and ours isn't now.
Is it?
He *seems* to be saying that you'd have to make special rules for each allowed HTML tag, and presumably each allowed attribute and property thereof, and maybe even every combination of them (!). Would there be any advantage in leaving those out of the grammar and keeping Parser and Sanitizer separate as they are now?
I don't get why we even allow HTML tags, other than convenience. It's not like the final output of the encyclopaedia is guaranteed to bear any resemblance to a web page...
For instance, why do we support <b>? We have '''... It's just not clean. (I dare someone to reply that ''' is semantic markup...heh.)
It is; I believe it renders as <strong>, not <bold>.
Cheers, -- jra