On 10/26/07, Steve Sanbeg ssanbeg@ask.com wrote:
I'm not sure simply porting to a different language would have such a huge affect, and certainly isn't easy with a grammar that's not well defined. Currently, even if you were to render a large plain-text page with no markup, MW would still have to make about dozen passes over the text to determine that there's really nothing to do; that's going to be slow, no matter what language it's done in.
That depends on a number of things. Twelve passes in C is certainly a *lot* faster than twelve passes in PHP. Remember that the difference engine used to be one of the slowest components of MediaWiki, until it was rewritten (using an identical algorithm) in C++ -- now it's far faster than rendering the exact same page.
I think a much simpler interpreted parser would beat a complex compiled one, unless you're dealing with small pages where initial overhead is significant.
Tim once remarked to me on IRC that he suspected a one-pass PHP parser would be slower than our current one, simply because the current one avoids going through each character in PHP. Something like preg_split is fast precisely because it's executed in C: then PHP only has to deal with ten or twenty or two hundred chunks of text, rather than a hundred thousand individual characters.
I don't think the text length is very accurate; we definitely need something better. Also, I think a big part of the problem is with the parser functions; they tend to first expand every template passed into them, then decide which one to keep. Deferring that expansion, which could be done by adding a keyword to each nested template call, should help there, although there may be a better way.
Well, if the expansion is deferred that should be decided by the individual parser function, not by the call syntax for the template. Either way, I think some more careful benchmarking is needed here before anyone can say what limits are best to add. One thing that's for sure is that it's the templates/conditionals specifically that are the problem, not refs or links or whatever: replaceVariables takes up something like 50% of CPU time now, or what? There are charts around somewhere.