On Fri, Aug 13, 2004 at 09:38:34PM +0200, Ashar Voultoiz wrote:
Brion Vibber wrote:
Magnus Manske wrote:
I therefore suggest a new structure:
- Preprocessor
- Wiki markup to XML
- XML to (X)HTML
This doesn't actually solve any of the issues with the current parser, since it merely has it produce a different output format.
The main problems are that we have a mess of regexps that stomp on each other all the time.
-- brion vibber (brion @ pobox.com)
Can't we switch back to the tokenizer parser and try to optimize it ? The token approch Seems much easier to maintain.
Character-by-character string parsing in PHP is slow since there is too much overhead. Tokenizing probably has to be done in a C(++) function.
Another point where the tokenizer was slow was the byte-by-byte composing of the result string. I've been told that adding small strings to an array and joining them in the end is much faster, worth a try.
JeLuF