Steve Bennett wrote:
...
> The trouble there is that <ref> for example can contain
> wikitext...which needs to be parsed. e.g.:
> <ref>''The origin of
species'', Darwin</ref>
> So at a minimum I think we would need to
distinguish those extensions
> whose internal text needs to be parsed?
No. If a tag-style extension wants to support wiki text, it has to explicitly
invoke a new parser pass on the text contained between the tags. The text MUST
NOT be parsed/transformed before being passed to the extension, and what the
extension returns must not be parsed either (the latter is only partially true
for the current parser, but i would call that a bug, not a feature - see bug 8997).
> 2) "parser functions" which conform to
an extended template syntax:
...
Afaik, these are converted by the preprocessor
(recently rewritten by
Tim), and are completely invisible to the parser?
I don't know. I don't see why parser functions should be handeled by the
preprocessor while tag hooks are not. But maybe this is so.
magic_word: UNDERSCORE UNDERSCORE magic_word_text
UNDERSCORE UNDERSCORE
-> ^(MAGIC_WORD magic_word_text);
...
It would only be a problem if the contents of the
magic word
interfered with the lexer - say a combination of letters and other
punctuation. But if the available combinations were predefined (eg,
hyphen hyphen letters digit hyphen hyphen) then they can be dealt
with, and the letters themselves defined at runtime.
Magic words don't have to have the form __XXX__ - they can be characterized by
any regular expression. Consider how ISBN and RFC are treated - those are magic
words too... Oh and please consider that the patterns are frequently localizable
(and are thus maintained in mediawiki's messages files): French, for example,
allows __AUCUNETABLE__ for __NOTOC__. The same goes for #REDIRECT btw: dutch
allows #DOORVERWIJZING, etc...
I'm not entirely sure if extensions are free to define magic words using *any*
pattern, but I think this is so. MagicWord.php is entirely regex-based. Which
would mean that either your parser will only support some types of magic words,
or it needs a way to hook into the actual grammar.
Oh, and "variables" like {{PAGENAME}} are treated as magic words internally,
though that wouldn't have to be so. I would probably use the template mechanism,
and simply intercept the use of special names.
-- Daniel