On 10/26/07, Steve Sanbeg <ssanbeg(a)ask.com> wrote:
I'm not sure simply porting to a different
language would have such a huge
affect, and certainly isn't easy with a grammar that's not well defined.
Currently, even if you were to render a large plain-text page with no
markup, MW would still have to make about dozen passes over the text to
determine that there's really nothing to do; that's going to be slow, no
matter what language it's done in.
That depends on a number of things. Twelve passes in C is certainly a
*lot* faster than twelve passes in PHP. Remember that the difference
engine used to be one of the slowest components of MediaWiki, until it
was rewritten (using an identical algorithm) in C++ -- now it's far
faster than rendering the exact same page.
I think a much simpler interpreted
parser would beat a complex compiled one, unless you're dealing with small
pages where initial overhead is significant.
Tim once remarked to me on IRC that he suspected a one-pass PHP parser
would be slower than our current one, simply because the current one
avoids going through each character in PHP. Something like preg_split
is fast precisely because it's executed in C: then PHP only has to
deal with ten or twenty or two hundred chunks of text, rather than a
hundred thousand individual characters.
I don't think the text length is very accurate; we
definitely need
something better. Also, I think a big part of the problem is with the
parser functions; they tend to first expand every template passed into
them, then decide which one to keep. Deferring that expansion, which
could be done by adding a keyword to each nested template call, should
help there, although there may be a better way.
Well, if the expansion is deferred that should be decided by the
individual parser function, not by the call syntax for the template.
Either way, I think some more careful benchmarking is needed here
before anyone can say what limits are best to add. One thing that's
for sure is that it's the templates/conditionals specifically that are
the problem, not refs or links or whatever: replaceVariables takes up
something like 50% of CPU time now, or what? There are charts around
somewhere.