2010-08-28 14:32, Daniel Friesen skrev:
Andreas Jonsson wrote:
... Trying to reproduce this behavior in a new parser would, of course, be insane. In fact, the current MediaWiki parser does not seem to parse links in linear time using linear amount of memory. My test server failed to process a preview of an article consisisting of about 24000 links on the form [[a]]. It was working hard before it, I guess, ran out of memory. As a comparison it parsed over 38000 italic a's, ''a'', without problems.
So, what is the reasonable thing to do? First of all it should be pointed out that block elements are not allowed inside link text:
http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html#dtdentry_xhtml1-stri...
This suggests that any sane wikitext should not allow a link to continue past the end of the inlined text where it is located. Even better is to say that the sequence [[Link| always opens up a new link and that 'end of inline text' will implicitly close the link if it is still open. That will not require any lookahead to parse. It would be consistent with the format parsing to only allow it to run to the end of line, though. Also, currently paragraphs and list elements aren't rendered inside link text, unless enclosed or preceeded by a table. So, unless tables inside link text is a widely used feature, such a change might not break that many pages.
/Andreas
Keep in mind that MediaWiki is switching to html5. As the browsers don't even parse according to xhtml rules, and the xhtml doctype means nothing but a hint to validators (which not every page even validates properly anyways) which aren't essential, I don't believe xhtml rules -- with the exception of valid xml output -- are valid if they are retracted by html5 (which attempts to define html parsing how it should be, based on how it already is, iirc). In this case, html5 defines<a> as "transparent content", block elements are valid inside of an<a> if they are valid without the<a> there. So as long as you don't output the<p>, as you would do anyways if you got the<div> directly, then<a ...><div>...</div></a> is valid.
Just making note...
That's very interesting. I didn't know that.
/Andreas