Daniel Kinzler wrote:
Steve Bennett wrote:
magic_word: UNDERSCORE UNDERSCORE magic_word_text UNDERSCORE UNDERSCORE -> ^(MAGIC_WORD magic_word_text);
...
It would only be a problem if the contents of the magic word interfered with the lexer - say a combination of letters and other punctuation. But if the available combinations were predefined (eg, hyphen hyphen letters digit hyphen hyphen) then they can be dealt with, and the letters themselves defined at runtime.
Magic words don't have to have the form __XXX__ - they can be characterized by any regular expression. Consider how ISBN and RFC are treated - those are magic words too... Oh and please consider that the patterns are frequently localizable (and are thus maintained in mediawiki's messages files): French, for example, allows __AUCUNETABLE__ for __NOTOC__. The same goes for #REDIRECT btw: dutch allows #DOORVERWIJZING, etc...
I'm not entirely sure if extensions are free to define magic words using *any* pattern, but I think this is so. MagicWord.php is entirely regex-based. Which would mean that either your parser will only support some types of magic words, or it needs a way to hook into the actual grammar.
I think they more or less can. But that could be restricted. The few people using magic words will have replicated its format, so if you're using a magic word not in __XXX__ form you're out.