Rowan Collins wrote:
I know the current version doesn't do anything, but I've been meaning for a while to finalise a patch to show a message saying "This is a redirect to [[foo]]".
This has already been done in 1.4.
- Things like __NOTOC__ and stuff can be handled like this:
- Regard *everything* of the form __CAPITALLETTERS__ as a special token
Actually, it can be lower case currently. Unless we're going to hunt the database for examples where it is, best just treat __anystringofletters__ as needing to be investigated.
Indeed. I didn't know that. But it isn't a problem at all. Even with it being case-insensitive, I don't think it's asking too much of the users to put <nowiki> around anything that looks like these, since they are rarely enough intended to be actual text. I would highly doubt that any significant amount of articles is currently relying on them being text.
- The template pseudo-variables (e.g. CURRENTMONTH) are similarly handled in post-processing.
By which, do you mean they are treated as templates and then recognised as magic after? Just curious.
Yep, that's right.
- HTML tags and extension names are either not internationalised, or all translations of them are made to work on all Wikipedias.
That seems a bit of a step backwards to me. Actually, everything that looks like a SGML tag has to be treated one of three ways:
a) it is an extension, and everything from there to its partner should be unparsed / sent somewhere else for parsing b) it's an allowed HTML tag, and should be put in the parse-tree as that kind of element, with its contents parsed "independently" (sort of) c) it is neither of the above, and needs entity escaping so that it doesn't get as far as the browser still looking like HTML
I am perfectly happy with this, but since the parser is a stand-alone module, I cannot treat a particular word as case (a) on one Wikipedia but case (c) on another.
I'm not sure why you think allowing all translations on all Wikipedias would be a "step backwards"? Or do you seriously think someone would use the Chinese translation of <math> on the English Wikipedia? :)
But if you still insist on this, then I have two suggestions:
* We could replace the "other-language" words with the "this-language" words upon save. I.e. if someone wrote <math> on the Chinese Wikipedia, it would automatically be changed into "<" + some Chinese characters + ">" before storing it in the DB.
* Alternatively, we could have the parser recognise only the canonical (English) words, and have the PHP software replace non-English magic words with the canonical (English) words before invoking the parser. I am uncomfortable with this solution because it resorts to the same kind of patchwork that is erking me about the current not-a-parser.
Perhaps extensions could be made to return a parse sub-tree (even if it only has one element). Then we could use a HTML "extension" bound to all allowed HTML tags, which just called the original parser back on the contents of the tags.
This is an interesting thought, but I think it is inefficient with regards to performance. If the parser knows about allowed HTML tags (and the difference between an HTML tag and an extension) beforehand, this extra step would be saved. Additionally, your idea works only for tags that are independent of other tags; it would not work well with tables.
Timwi