Re: [Wikitech-l] 20071018 dumps have more problems. "United States" does not render.

26 Oct 2007


      On Fri, 26 Oct 2007 15:05:44 -0400, Simetrical wrote:
...
On 10/26/07, Steve Sanbeg ssanbeg@ask.com wrote:
...
I'm not sure simply porting to a different language would have such a
huge affect, and certainly isn't easy with a grammar that's not well
defined. Currently, even if you were to render a large plain-text page
with no markup, MW would still have to make about dozen passes over the
text to determine that there's really nothing to do; that's going to be
slow, no matter what language it's done in.
That depends on a number of things.  Twelve passes in C is certainly a
*lot* faster than twelve passes in PHP.  Remember that the difference
engine used to be one of the slowest components of MediaWiki, until it was
rewritten (using an identical algorithm) in C++ -- now it's far faster
than rendering the exact same page.
My own experiences with perl & C haven't shown such dramatic differences,
and that some operations scale linearly with the number of passes. I
was assuming PHP would be similar, although I haven't benchmarked
differences in language or passes for this.
...
...
I think a much simpler interpreted
parser would beat a complex compiled one, unless you're dealing with
small pages where initial overhead is significant.
Tim once remarked to me on IRC that he suspected a one-pass PHP parser
would be slower than our current one, simply because the current one
avoids going through each character in PHP.  Something like preg_split is
fast precisely because it's executed in C: then PHP only has to deal with
ten or twenty or two hundred chunks of text, rather than a hundred
thousand individual characters.
The number of individual characters that are significant to wiki markup is
actually fairly small.  Changing it to one pass would significantly alter
the language in a lot of cases.  But I still think if we could do it in
three or so passes it would be faster, even if we did have to deal with
dozens, or even hundreds, of individual characters.
...
...
I don't think the text length is very accurate; we definitely need
something better.  Also, I think a big part of the problem is with the
parser functions; they tend to first expand every template passed into
them, then decide which one to keep.  Deferring that expansion, which
could be done by adding a keyword to each nested template call, should
help there, although there may be a better way.
Well, if the expansion is deferred that should be decided by the
individual parser function, not by the call syntax for the template.
Either way, I think some more careful benchmarking is needed here before
anyone can say what limits are best to add.  One thing that's for sure is
that it's the templates/conditionals specifically that are the problem,
not refs or links or whatever: replaceVariables takes up something like
50% of CPU time now, or what?  There are charts around somewhere.
Yes, certainly variable replacement.  I think it's clear that
something like {{#if{{a}}:{{defer:b}}|{{defer:c}}}} would be more
efficient than {{#if:{{a}}|{{b}}|{{c}}}}.  If that behavior was implicit
in #if, rather than adding a new modifier and plugging it into all the
templates, so much the better.
I agree that there should be benchmarking to suggest new limits.  Really,
we should have a cost per transclusion/function, which could vary by
function, that the caller would be charged.  This would much more
accurately address the issue.  The side affect might be that large classes
of those spaghetti templates become inoperable.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] 20071018 dumps have more problems. "United States" does not render.