On 2/13/08, Daniel Kinzler daniel@brightbyte.de wrote:
No. If a tag-style extension wants to support wiki text, it has to explicitly invoke a new parser pass on the text contained between the tags. The text MUST NOT be parsed/transformed before being passed to the extension, and what the extension returns must not be parsed either (the latter is only partially true for the current parser, but i would call that a bug, not a feature - see bug 8997).
So, the parse sequence for:
* <ref> '''blah'''</ref>
basically goes: 1. Parse bullet and find <ref>...</ref> 2. Pass <ref> chunk to extension. 3. Extension processes <ref> chunk, calls parser to process the bold tags, returns something with <b>blah</b> 4. Parser continues on...
Magic words don't have to have the form __XXX__ - they can be characterized by any regular expression. Consider how ISBN and RFC are treated - those are magic words too... Oh and please consider that the patterns are frequently localizable
No they're not. Quite specifically, they're not - the key words (ISBN, RFC, PMID) are hardcoded into the parser code and not internationalisable. I call them "magic links" in my grammar.
(and are thus maintained in mediawiki's messages files): French, for example, allows __AUCUNETABLE__ for __NOTOC__. The same goes for #REDIRECT btw: dutch allows #DOORVERWIJZING, etc...
That's ok - I'd forgotten that the #REDIRECT word is a magic word though.
I'm not entirely sure if extensions are free to define magic words using *any* pattern, but I think this is so. MagicWord.php is entirely regex-based. Which would mean that either your parser will only support some types of magic words, or it needs a way to hook into the actual grammar.
Yes, as I discussed, there will need to be restrictions on the form of magic words, which is not a bad thing anyway.
Oh, and "variables" like {{PAGENAME}} are treated as magic words internally, though that wouldn't have to be so. I would probably use the template mechanism, and simply intercept the use of special names.
I'm a bit unclear on the meaning and current processing of the things involving curly braces. Can someone help me out here:
* {{template}} - totally handled by preprocessor? *{{{1}}} - template parameter, totally handled by preprocessor? *{{PAGENAME}} - "magic" variable? Where is it handled? Does it have to be caps? *{{foo:blah}} - parser function? Where is it handled? *{{defaultsort:blah}} - same question
Any others?
Currently I'm handling these: * __TOC__ etc (magic words) * #REDIRECT * ISBN, PMID, RFC (magic links)
Steve