From: "Uri Yanover" <uriyan_subscribe(a)yahoo.com>
It occured to me that while the current code is fine
as an initial version, it should be optimized if we
expect the Wikipedia traffic to grow significantly,
since using PHP regular expressions and string
management functions to process the markup is
inefficient.
The obvious solution would be to write a "real"
one-pass parser.
Actually that is what I also first had in mind, but now I
think that this would be a bad idea. It makes the code
more complicated and therefore harder to debug and
harder to introduce new and/or changed mark-up
features. It is important that the code is kept so simple
that newcomers who find a bug should in principle, if
they know PHP and MySQL, be able to find the
problem and send a patch.
Moreover, I don't think that doing it with
regular expresssions is that inefficient. The regular
expressions, especially the Perl compatible ones, are
very well implemented in PHP and most wiki systems
do it that way. If you are clever with perl regular expressions
there is still a lot of optimization that you can do. If you
want inspiration look at the implementation of PhpWiki.
I could try to re-write the parser in either PHP
or C, but I wanted to ask first the members of
what do they think of the subject (the extent
to which the code can be optimized, which
language to choose, in what technique should the
parser be written, which Wiki syntax changes
should be made, etc.).
My advice would be: first try it in PHP with the PCRE's.
If that doesn't work, write a real one-pass parser in PHP. If that
also doesn't work, then you can start thinking about C and, for example,
yacc/bison. As for Wiki syntax changes, IMO we should first get the
current syntax efficiently and error-free working, and only then start
thinking
about changes and extensions.
Kind regards,
-- Jan Hidders