I'm pleased to report that my ANTLR grammar outperforms* the current
mediawiki parser on the following pathological text:
[[[[image:foo.jpg|thumb|[[[o]]][[foo||]]|[[image:bar.jpg|thumb|[[roo
my doo|zoo|]]]]]]]]]
It's really amazing what you discover about Wikitext when you sit down
to analyse it like this. For example, a square bracket - [ - is:
- the start of an external link, if the rest of it is present, and not
in a context where external links are forbidden (notably, captions of
internal links or other external links), and not inside a nowiki tag
- part of the start of an internal link, as long as the rest is
present, and it couldn't be interpreted as an internal link, and in an
appropriate context
- a literal otherwise - that is, in any non-linkable context, not
followed by the appropriate tags to make it a link, or inside a nowiki
A pipe - | - is:
- an option separator for an image, provided that it's not within an
embedded object such as internal link or another image, and provided
that it's not within a nowiki
- a link caption separator, provided that it's not in nowiki tags
- any of a dozen other cases that I haven't dealt with yet, like
tables, templates, parser functions, categories, ...
- literal otherwise.
It's fun! I think...
Steve
* The current parser gives up. ANTLR, after a monumental struggle
involving 21 levels of method call and a bit of backtracking, parses
it correctly.