[Wikitext-l] Wikitext Madness of the Day: Internal Links
Daniel Friesen
lists at nadir-seen-fire.com
Sat Aug 28 12:32:57 UTC 2010
Andreas Jonsson wrote:
> ...
> Trying to reproduce this behavior in a new parser would, of course, be
> insane. In fact, the current MediaWiki parser does not seem to parse
> links in linear time using linear amount of memory. My test server
> failed to process a preview of an article consisisting of about 24000
> links on the form [[a]]. It was working hard before it, I
> guess, ran out of memory. As a comparison it parsed over 38000 italic
> a's, ''a'', without problems.
>
> So, what is the reasonable thing to do? First of all it should be
> pointed out that block elements are not allowed inside link text:
>
> http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html#dtdentry_xhtml1-strict.dtd_a
>
> This suggests that any sane wikitext should not allow a link to
> continue past the end of the inlined text where it is located. Even
> better is to say that the sequence [[Link| always opens up a new link
> and that 'end of inline text' will implicitly close the link if it is
> still open. That will not require any lookahead to parse. It would
> be consistent with the format parsing to only allow it to run to the
> end of line, though. Also, currently paragraphs and list elements
> aren't rendered inside link text, unless enclosed or preceeded by a
> table. So, unless tables inside link text is a widely used feature,
> such a change might not break that many pages.
>
> /Andreas
>
Keep in mind that MediaWiki is switching to html5. As the browsers don't
even parse according to xhtml rules, and the xhtml doctype means nothing
but a hint to validators (which not every page even validates properly
anyways) which aren't essential, I don't believe xhtml rules -- with the
exception of valid xml output -- are valid if they are retracted by
html5 (which attempts to define html parsing how it should be, based on
how it already is, iirc).
In this case, html5 defines <a> as "transparent content", block elements
are valid inside of an <a> if they are valid without the <a> there. So
as long as you don't output the <p>, as you would do anyways if you got
the <div> directly, then <a ...><div>...</div></a> is valid.
Just making note...
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
More information about the Wikitext-l
mailing list