Rowan Collins wrote:
I know the current version doesn't do anything,
but I've been meaning
for a while to finalise a patch to show a message saying "This is a
redirect to [[foo]]".
This has already been done in 1.4.
* Things like
__NOTOC__ and stuff can be handled like this:
* Regard *everything* of the form __CAPITALLETTERS__ as a special
token
Actually, it can be lower case currently. Unless we're going to hunt
the database for examples where it is, best just treat
__anystringofletters__ as needing to be investigated.
Indeed. I didn't know that. But it isn't a problem at all. Even with it
being case-insensitive, I don't think it's asking too much of the users
to put <nowiki> around anything that looks like these, since they are
rarely enough intended to be actual text. I would highly doubt that any
significant amount of articles is currently relying on them being text.
* The template
pseudo-variables (e.g. CURRENTMONTH) are similarly
handled in post-processing.
By which, do you mean they are treated as templates and then
recognised as magic after? Just curious.
Yep, that's right.
* HTML tags and
extension names are either not internationalised, or all
translations of them are made to work on all Wikipedias.
That seems a bit of a step backwards to me. Actually, everything that
looks like a SGML tag has to be treated one of three ways:
a) it is an extension, and everything from there to its partner should
be unparsed / sent somewhere else for parsing
b) it's an allowed HTML tag, and should be put in the parse-tree as
that kind of element, with its contents parsed "independently" (sort
of)
c) it is neither of the above, and needs entity escaping so that it
doesn't get as far as the browser still looking like HTML
I am perfectly happy with this, but since the parser is a stand-alone
module, I cannot treat a particular word as case (a) on one Wikipedia
but case (c) on another.
I'm not sure why you think allowing all translations on all Wikipedias
would be a "step backwards"? Or do you seriously think someone would use
the Chinese translation of <math> on the English Wikipedia? :)
But if you still insist on this, then I have two suggestions:
* We could replace the "other-language" words with the
"this-language"
words upon save. I.e. if someone wrote <math> on the Chinese
Wikipedia, it would automatically be changed into "<" + some Chinese
characters + ">" before storing it in the DB.
* Alternatively, we could have the parser recognise only the canonical
(English) words, and have the PHP software replace non-English magic
words with the canonical (English) words before invoking the parser.
I am uncomfortable with this solution because it resorts to the same
kind of patchwork that is erking me about the current not-a-parser.
Perhaps extensions could be made to return a parse
sub-tree (even if
it only has one element). Then we could use a HTML "extension" bound
to all allowed HTML tags, which just called the original parser back
on the contents of the tags.
This is an interesting thought, but I think it is inefficient with
regards to performance. If the parser knows about allowed HTML tags (and
the difference between an HTML tag and an extension) beforehand, this
extra step would be saved. Additionally, your idea works only for tags
that are independent of other tags; it would not work well with tables.
Timwi