On Fri, 26 Oct 2007 14:09:38 -0400, Simetrical wrote:
On 10/26/07, Steve Summit scs@eskimo.com wrote:
The question is, how true is it that "almost every very-high-traffic page on Wikipedia is having extreme problems right now". I suspect not, but if so, is it because there are more pages with say, heavy use of the {cite} template, or because templates like {cite} have gotten more complicated, or because template interpolation has somehow gotten slower, or simply because there are more hits and edits being processed every day, such that our headroom is going down?
Well, whatever the problem is, I suspect I know one way that would fix it: rewriting the parser in C(++). Unfortunately, that's a whole lot easier said than done. Rewriting even part of it, though, say replaceVariables, might be a big benefit.
I'm not sure simply porting to a different language would have such a huge affect, and certainly isn't easy with a grammar that's not well defined. Currently, even if you were to render a large plain-text page with no markup, MW would still have to make about dozen passes over the text to determine that there's really nothing to do; that's going to be slow, no matter what language it's done in. I think a much simpler interpreted parser would beat a complex compiled one, unless you're dealing with small pages where initial overhead is significant.
For now it might be best to refine our heuristics of what's slow to render. Currently we use a simple text-length heuristic, but perhaps it would make more sense to incorporate additional criteria. Maximum number of template inclusions? Maximum template depth? It would require testing to see what would be effective.
I don't think the text length is very accurate; we definitely need something better. Also, I think a big part of the problem is with the parser functions; they tend to first expand every template passed into them, then decide which one to keep. Deferring that expansion, which could be done by adding a keyword to each nested template call, should help there, although there may be a better way.