[Wikitext-l] Wikitext Madness of the Day: Internal Links

Daniel Friesen lists at nadir-seen-fire.com
Sat Aug 28 12:32:57 UTC 2010


Andreas Jonsson wrote:
> ...
> Trying to reproduce this behavior in a new parser would, of course, be
> insane.  In fact, the current MediaWiki parser does not seem to parse
> links in linear time using linear amount of memory.  My test server
> failed to process a preview of an article consisisting of about 24000
> links on the form [[a]]. It was working hard before it, I
> guess, ran out of memory.  As a comparison it parsed over 38000 italic
> a's, ''a'', without problems.
>
> So, what is the reasonable thing to do?  First of all it should be
> pointed out that block elements are not allowed inside link text:
>
> http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html#dtdentry_xhtml1-strict.dtd_a
>
> This suggests that any sane wikitext should not allow a link to
> continue past the end of the inlined text where it is located.  Even
> better is to say that the sequence [[Link| always opens up a new link
> and that 'end of inline text' will implicitly close the link if it is
> still open.  That will not require any lookahead to parse.  It would
> be consistent with the format parsing to only allow it to run to the
> end of line, though.  Also, currently paragraphs and list elements
> aren't rendered inside link text, unless enclosed or preceeded by a
> table.  So, unless tables inside link text is a widely used feature,
> such a change might not break that many pages.
>
> /Andreas
>   
Keep in mind that MediaWiki is switching to html5. As the browsers don't 
even parse according to xhtml rules, and the xhtml doctype means nothing 
but a hint to validators (which not every page even validates properly 
anyways) which aren't essential, I don't believe xhtml rules -- with the 
exception of valid xml output -- are valid if they are retracted by 
html5 (which attempts to define html parsing how it should be, based on 
how it already is, iirc).
In this case, html5 defines <a> as "transparent content", block elements 
are valid inside of an <a> if they are valid without the <a> there. So 
as long as you don't output the <p>, as you would do anyways if you got 
the <div> directly, then <a ...><div>...</div></a> is valid.

Just making note...

-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]




More information about the Wikitext-l mailing list