On Thu, 23 Sep 2004 20:09:21 +0100, Timwi <timwi(a)gmx.net> wrote:
Still,
doesn't this mean the parser needs to recognise
"#REDIRECT <linkpattern>" as a special token? And doesn't that, in
turn, present a problem if we want to retain MagicWord i18n?
Not really. We can still recognise redirects with a regexp (or anything
else in PHP) before passing the page to the parser.
But why make that a special case? Why say "before using the nice
eficient real parser, use a not-a-parser to check for the #REDIRECT
directive, and have it do some voodoo" Far better to just have the
parser recognise "#REDIRECT" (and any variants anyone wants) and
output a parse tree with a special redirect node.
First of all, even in the current system there is no
way for server
admins to customise the magic words without modifying actual source
code.
Well, technically, no, but Language*.php and LocalSettings.php are
more like configuration files that happen to be executable for
convenience. Editing the declaration of $wgMagicWordsEn in
Language.php is no more difficult or involved than, say, editing a
.ini file.
Secondly, you're making it sound like recompiling
the parser was
some sort of monumental task.
Actually, I have to admit I had no idea how difficult it would be, but
I assumed it would mean having at least a compiler, if not a
compiler-compiler and a whole load of other tools. Editing PHP doesn't
need that kind of thing, and the way its designed now, you needn't
notice your editing code.
Here's an idea. One could provide a .c or .h file
where #define
statements are used to define the magic words, and then make sure that
if you modify it, you only need to recompile the binary (i.e. invoke
gcc) but you don't need flex, bison, or swig. But even if you were to
require flex, bison and swig, even then the recompilation can be
automated by a simple script.
If it were possible to only require a c compiler, it would certainly
be a favour to other admins running MediaWiki. It's going to be
annoying enough for some of them to have to deal with a binary part as
well as PHP.
So maybe
you're right, and the only workable solution is to have all
variants hard-coded in the parser. I guess this is where we come to
regret adopting an "extension" syntax that matches/conflicts with the
syntax used by "allowed bits of HTML".
True. If we had something like [!math x^2 + y^2 = z^2 !], then we could
say "everything in [! ... !] is an extension". Would make life much easier.
It's oh so tempting to say "let's change it" but a) I'd be mobbed
by
everyone who voted for the current syntax (which includes myself) and
b) we'd have to go through changing exisitng uses of <math>, or make
it a special case, or something.
I think, considering all of these problems we have
discussed, it makes a
real lot of sense to formulate a "rule" that the design of the parser
should fulfill: The parser must know in advance how to parse everything.
The resulting parse tree must not depend on anything other than the
input wiki text.
Yep, I think you're probably right on that one. And as you say, the
more things that can e done inside the parser, the better, since
outside means PHP, and is likely to be less efficient.
--
Rowan Collins BSc
[IMSoP]