Jens Frank wrote
You should perhaps have a look at 1.3 first. Parts of the Parser are already a real parser, reading the wikitext in one pass, character by character. See Tokenizer.php and its use in Parser.php. This work is not yet completed, so the regexes still exist for some parts of the
markup.
I hadn't seen this bit of the parser. Last time I looked at it, it was still splitting the string using regexes. When I saw the way you do it currently, I have to admit I went into a bit of a panic. In my experience reading a large string character by character in a high level language is a very bad idea. Indeed, our "parser" to date has gone to some lengths to avoid this, using regexes in all sorts of contrived ways to avoid executing a number of PHP lines proportional to the number of characters.
After I calmed down, I fixed up the profiler and did a couple of runs. Gabriel Wicke did some too, using ab. They're at:
http://meta.wikipedia.org/wiki/Profiling
They show that the page view time for the current CVS HEAD is double what it was in 1.2.5. The parser itself was rougly 2.4 times slower.
This is completely unacceptable considering the current state of our web serving hardware. The latest batch of 1U servers won't cover the penalty from upgrading to 1.3. Our web servers are not keeping up with demand in peak times as it is, during peak times their queues all overflow, giving users random error messages.
The current plan is to revert the tokenizer sections of the parser back to something similar to 1.2. Hopefully we'll get it working soon, since the Board vote feature I've written is a 1.3 extension and voting is meant to start in 4 days.
-- Tim Starling