Since my plans for a nice, calm evening of backup & language setup were spoiled by slashdot, I've gotten a little tired of the performance guessing game. I've started adding some profiling code to actually show where the bottlenecks are; some preliminary stuff is put into CVS and running on meta.wikipedia.org.
Set $wgProfiling = true; in LocalSettings.php to enable profiling. Functions to be profiled (or sections of functions) should be wrapped in wfProfileIn( "functionnamegoeshere-nospacesplease" ) and wfProfileOut() calls (before all return points!). The output goes into the debug log file. An example:
20030123172827 0000.587 /wiki/Meta.wikipedia.org_technical_issues at 0000.043 in 0000.001 ( 0.1%) - main-misc-setup at 0000.043 in 0000.002 ( 0.4%) - * Article::getContent at 0000.046 in 0000.000 ( 0.1%) - *** OutputPage::removeHTMLtags at 0000.047 in 0000.000 ( 0.1%) - *** OutputPage:replaceVariables at 0000.048 in 0000.003 ( 0.5%) - *** OutputPage::doBlockLevels at 0000.051 in 0000.016 ( 2.7%) - *** OutputPage::replaceExternalLinks at 0000.067 in 0000.050 ( 8.5%) - *** OutputPage::replaceInternalLinks at 0000.046 in 0000.071 (12.1%) - ** OutputPage::doWikiPass2 at 0000.046 in 0000.071 (12.1%) - * OutputPage::addWikiText at 0000.043 in 0000.074 (12.6%) - Article::view at 0000.117 in 0000.000 ( 0.0%) - * OutputPage::output-headers at 0000.117 in 0000.000 ( 0.0%) - ** Skin::initPage at 0000.118 in 0000.005 ( 0.8%) - ** Skin::doBeforeContent at 0000.117 in 0000.006 ( 1.0%) - * OutputPage::output-middle at 0000.123 in 0000.448 (76.3%) - * OutputPage::output-bodytext at 0000.578 in 0000.009 ( 1.5%) - *** Skin::quickBar at 0000.571 in 0000.016 ( 2.8%) - ** Skin::doAfterContent at 0000.571 in 0000.016 ( 2.8%) - * OutputPage::output-after at 0000.117 in 0000.470 (80.0%) - OutputPage::output
Note that subcalls appear above their callers, just to make things more interesting. ;)
Most of our time seems is often spent in OutputPage::output; the precise section where the time gets spent seems to vary, and I suspect it depends on output buffering. It may make sense to move the profile summing bit to before the output actually starts to make it easier to see what the rendering is really doing.
In rendering land, the winner is OutputPage::replaceInternalLinks(). The more links in the page, the longer (and longer) this takes. Through the link cache, it makes a database query for every individual linked page to check if it exists. More detail should be checked for this to see how much of that time is spent in db queries; but I suspect that pre-filling the link cache from the links and brokenlinks tables should speed this up.
-- brion vibber (brion @ pobox.com)