[Wikipedia-l] Optimizing the Wiki parser

Fri Feb 8 05:32:27 UTC 2002

Hello,

During the course of our recent discussion, Jan
Hidders said:

> Have you seen the parsing code? There is nothing _very light_ about it, at
> the moment.

As soon as I had the opportunity, I located the
current Wiki parsing code

(http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/wikipedia/phpwiki/fpw/wikiPa
ge.php?rev=1.44&content-type=text/vnd.viewcvs-markup)

It occured to me that while the current code is fine
as an initial version, it should be optimized if we
expect the Wikipedia traffic to grow significantly,
since using PHP regular expressions and string
management functions to process the markup is
inefficient.

The obvious solution would be to write a "real"
one-pass parser. Doing this stand-alone (or as CGI)
would be quickest using C; I do not know whether an
efficient solution exsits using PHP considering the
way strings are implemented in it. However,
executing C code from PHP is complicated in its
own right (UNIX pipes being the simplest solution,
in my opinion).

I could try to re-write the parser in either PHP
or C, but I wanted to ask first the members of
what do they think of the subject (the extent
to which the code can be optimized, which
language to choose, in what technique should the
parser be written, which Wiki syntax changes
should be made, etc.).

Sincerely yours,
                        Uri Yanover