On Thu, 23 Sep 2004 20:09:21 +0100, Timwi timwi@gmx.net wrote:
Still, doesn't this mean the parser needs to recognise "#REDIRECT <linkpattern>" as a special token? And doesn't that, in turn, present a problem if we want to retain MagicWord i18n?
Not really. We can still recognise redirects with a regexp (or anything else in PHP) before passing the page to the parser.
But why make that a special case? Why say "before using the nice eficient real parser, use a not-a-parser to check for the #REDIRECT directive, and have it do some voodoo" Far better to just have the parser recognise "#REDIRECT" (and any variants anyone wants) and output a parse tree with a special redirect node.
First of all, even in the current system there is no way for server admins to customise the magic words without modifying actual source code.
Well, technically, no, but Language*.php and LocalSettings.php are more like configuration files that happen to be executable for convenience. Editing the declaration of $wgMagicWordsEn in Language.php is no more difficult or involved than, say, editing a .ini file.
Secondly, you're making it sound like recompiling the parser was some sort of monumental task.
Actually, I have to admit I had no idea how difficult it would be, but I assumed it would mean having at least a compiler, if not a compiler-compiler and a whole load of other tools. Editing PHP doesn't need that kind of thing, and the way its designed now, you needn't notice your editing code.
Here's an idea. One could provide a .c or .h file where #define statements are used to define the magic words, and then make sure that if you modify it, you only need to recompile the binary (i.e. invoke gcc) but you don't need flex, bison, or swig. But even if you were to require flex, bison and swig, even then the recompilation can be automated by a simple script.
If it were possible to only require a c compiler, it would certainly be a favour to other admins running MediaWiki. It's going to be annoying enough for some of them to have to deal with a binary part as well as PHP.
So maybe you're right, and the only workable solution is to have all variants hard-coded in the parser. I guess this is where we come to regret adopting an "extension" syntax that matches/conflicts with the syntax used by "allowed bits of HTML".
True. If we had something like [!math x^2 + y^2 = z^2 !], then we could say "everything in [! ... !] is an extension". Would make life much easier.
It's oh so tempting to say "let's change it" but a) I'd be mobbed by everyone who voted for the current syntax (which includes myself) and b) we'd have to go through changing exisitng uses of <math>, or make it a special case, or something.
I think, considering all of these problems we have discussed, it makes a real lot of sense to formulate a "rule" that the design of the parser should fulfill: The parser must know in advance how to parse everything. The resulting parse tree must not depend on anything other than the input wiki text.
Yep, I think you're probably right on that one. And as you say, the more things that can e done inside the parser, the better, since outside means PHP, and is likely to be less efficient.