On Fri, 26 Oct 2007 15:05:44 -0400, Simetrical wrote:
On 10/26/07, Steve Sanbeg ssanbeg@ask.com wrote:
I'm not sure simply porting to a different language would have such a huge affect, and certainly isn't easy with a grammar that's not well defined. Currently, even if you were to render a large plain-text page with no markup, MW would still have to make about dozen passes over the text to determine that there's really nothing to do; that's going to be slow, no matter what language it's done in.
That depends on a number of things. Twelve passes in C is certainly a *lot* faster than twelve passes in PHP. Remember that the difference engine used to be one of the slowest components of MediaWiki, until it was rewritten (using an identical algorithm) in C++ -- now it's far faster than rendering the exact same page.
My own experiences with perl & C haven't shown such dramatic differences, and that some operations scale linearly with the number of passes. I was assuming PHP would be similar, although I haven't benchmarked differences in language or passes for this.
I think a much simpler interpreted parser would beat a complex compiled one, unless you're dealing with small pages where initial overhead is significant.
Tim once remarked to me on IRC that he suspected a one-pass PHP parser would be slower than our current one, simply because the current one avoids going through each character in PHP. Something like preg_split is fast precisely because it's executed in C: then PHP only has to deal with ten or twenty or two hundred chunks of text, rather than a hundred thousand individual characters.
The number of individual characters that are significant to wiki markup is actually fairly small. Changing it to one pass would significantly alter the language in a lot of cases. But I still think if we could do it in three or so passes it would be faster, even if we did have to deal with dozens, or even hundreds, of individual characters.
I don't think the text length is very accurate; we definitely need something better. Also, I think a big part of the problem is with the parser functions; they tend to first expand every template passed into them, then decide which one to keep. Deferring that expansion, which could be done by adding a keyword to each nested template call, should help there, although there may be a better way.
Well, if the expansion is deferred that should be decided by the individual parser function, not by the call syntax for the template. Either way, I think some more careful benchmarking is needed here before anyone can say what limits are best to add. One thing that's for sure is that it's the templates/conditionals specifically that are the problem, not refs or links or whatever: replaceVariables takes up something like 50% of CPU time now, or what? There are charts around somewhere.
Yes, certainly variable replacement. I think it's clear that something like {{#if{{a}}:{{defer:b}}|{{defer:c}}}} would be more efficient than {{#if:{{a}}|{{b}}|{{c}}}}. If that behavior was implicit in #if, rather than adding a new modifier and plugging it into all the templates, so much the better.
I agree that there should be benchmarking to suggest new limits. Really, we should have a cost per transclusion/function, which could vary by function, that the caller would be charged. This would much more accurately address the issue. The side affect might be that large classes of those spaghetti templates become inoperable.