Since my plans for a nice, calm evening of backup & language setup were spoiled by slashdot, I've gotten a little tired of the performance guessing game. I've started adding some profiling code to actually show where the bottlenecks are; some preliminary stuff is put into CVS and running on meta.wikipedia.org.
Set $wgProfiling = true; in LocalSettings.php to enable profiling. Functions to be profiled (or sections of functions) should be wrapped in wfProfileIn( "functionnamegoeshere-nospacesplease" ) and wfProfileOut() calls (before all return points!). The output goes into the debug log file. An example:
20030123172827 0000.587 /wiki/Meta.wikipedia.org_technical_issues at 0000.043 in 0000.001 ( 0.1%) - main-misc-setup at 0000.043 in 0000.002 ( 0.4%) - * Article::getContent at 0000.046 in 0000.000 ( 0.1%) - *** OutputPage::removeHTMLtags at 0000.047 in 0000.000 ( 0.1%) - *** OutputPage:replaceVariables at 0000.048 in 0000.003 ( 0.5%) - *** OutputPage::doBlockLevels at 0000.051 in 0000.016 ( 2.7%) - *** OutputPage::replaceExternalLinks at 0000.067 in 0000.050 ( 8.5%) - *** OutputPage::replaceInternalLinks at 0000.046 in 0000.071 (12.1%) - ** OutputPage::doWikiPass2 at 0000.046 in 0000.071 (12.1%) - * OutputPage::addWikiText at 0000.043 in 0000.074 (12.6%) - Article::view at 0000.117 in 0000.000 ( 0.0%) - * OutputPage::output-headers at 0000.117 in 0000.000 ( 0.0%) - ** Skin::initPage at 0000.118 in 0000.005 ( 0.8%) - ** Skin::doBeforeContent at 0000.117 in 0000.006 ( 1.0%) - * OutputPage::output-middle at 0000.123 in 0000.448 (76.3%) - * OutputPage::output-bodytext at 0000.578 in 0000.009 ( 1.5%) - *** Skin::quickBar at 0000.571 in 0000.016 ( 2.8%) - ** Skin::doAfterContent at 0000.571 in 0000.016 ( 2.8%) - * OutputPage::output-after at 0000.117 in 0000.470 (80.0%) - OutputPage::output
Note that subcalls appear above their callers, just to make things more interesting. ;)
Most of our time seems is often spent in OutputPage::output; the precise section where the time gets spent seems to vary, and I suspect it depends on output buffering. It may make sense to move the profile summing bit to before the output actually starts to make it easier to see what the rendering is really doing.
In rendering land, the winner is OutputPage::replaceInternalLinks(). The more links in the page, the longer (and longer) this takes. Through the link cache, it makes a database query for every individual linked page to check if it exists. More detail should be checked for this to see how much of that time is spent in db queries; but I suspect that pre-filling the link cache from the links and brokenlinks tables should speed this up.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
In rendering land, the winner is OutputPage::replaceInternalLinks(). The more links in the page, the longer (and longer) this takes. Through the link cache, it makes a database query for every individual linked page to check if it exists. More detail should be checked for this to see how much of that time is spent in db queries; but I suspect that pre-filling the link cache from the links and brokenlinks tables should speed this up.
Just turn on the stub detector and watch it take even longer ;-)
Seriously: On Phase II (RIP), I put all the existing (and broken) links into some text field. Maybe we should do that again for existing links *in addition* to the existing link table. We would get that list in the same query as the article body (no additional SQL query at all!) Then, we check if all the other links are broken, maybe write the "short list" back, and be done. Imagine: Not a single additional query for the Main Page!
The only thing we'd have to do is clear some of these "short lists" on page deletion. Shouldn't be a problem.
Magnus
On Don, 2003-01-23 at 18:51, Brion Vibber wrote:
In rendering land, the winner is OutputPage::replaceInternalLinks().
Told ya so: http://www.wikipedia.org/pipermail/wikitech-l/2002-November/001166.html
;-)
Can't we load the titles of all pages into memory persistently, perhaps using a ramdisk?
Regards,
Erik
wikitech-l@lists.wikimedia.org