[Wikitext-l] A pathological case

Magnus Manske magnusmanske at googlemail.com
Wed Nov 28 11:01:29 UTC 2007


On Nov 25, 2007 11:29 PM, Steve Bennett <stevagewp at gmail.com> wrote:
> I'm pleased to report that my ANTLR grammar outperforms* the current
> mediawiki parser on the following pathological text:
>
> [[[[image:foo.jpg|thumb|[[[o]]][[foo||]]|[[image:bar.jpg|thumb|[[roo
> my doo|zoo|]]]]]]]]]
>
> It's really amazing what you discover about Wikitext when you sit down
> to analyse it like this. For example, a square bracket - [ - is:
> - the start of an external link, if the rest of it is present, and not
> in a context where external links are forbidden (notably, captions of
> internal links or other external links), and not inside a nowiki tag
> - part of the start of an internal link, as long as the rest is
> present, and it couldn't be interpreted as an internal link, and in an
> appropriate context
> - a literal otherwise - that is, in any non-linkable context, not
> followed by the appropriate tags to make it a link, or inside a nowiki
>
> A pipe - | - is:
> - an option separator for an image, provided that it's not within an
> embedded object such as internal link or another image, and provided
> that it's not within a nowiki
> - a link caption separator, provided that it's not in nowiki tags
> - any of a dozen other cases that I haven't dealt with yet, like
> tables, templates, parser functions, categories, ...
> - literal otherwise.
>
> It's fun! I think...


FWIW, my wiki2xml doesn't give up either, and generates XML very
quickly. However, there's still a fluke in there (more than one;-)
that causes "[[image:foo.jpg" to be a link target. Might be the
correct behaviour, though, when you think about it...

I'll look at this more closely, eventually; nevertheless, it
generates"good" XML already, which IMHO is the most important thing.

Magnus



More information about the Wikitext-l mailing list