more 1.12 performance problems for our upgrade

List overview All Threads
Download

newer

older

GSoC Project: Category Moving

Re: [Wikitech-l] [MediaWiki-CVS]...

Travis (wikiHow)

24 Jun 2008 24 Jun '08

10:44 p.m.

Hi,

I'm still working on upgrading us to 1.12 and after upgrading the extensions, I'm noticing that the 1.12 upgrade still takes a lot longer to render a page. For ( $elapsed = $now - $wgRequestTime;) for 1.9, I usually see on average about 0.15 seconds to serve a page, while 1.12 I'm seeing more like 0.50 seconds, with both installations running on the same server and connected to the same DB server, etc.

Here are some of the functions I'm seeing take awhile in the profiler.

462.884 MediaWiki::initialize 395.097 MediaWiki::performAction 328.284 Article::view 233.001 Parser::parse 232.899 Parser::parse-Article::outputWikiText 134.212 Parser::internalParse 103.388 Parser::replaceVariables 252.385 MediaWiki::finalCleanup 251.857 OutputPage::output 251.402 Output-skin 251.332 SkinTemplate::outputPage 209.752 SkinTemplate::outputPage-execute 855.008 -total

The full profile is here:

http://69.20.102.10/x/profile_deep.txt

Any ideas? I've tried drilling down into finding out why some of these are taking a long time. replaceInternalLinks seems to take a long time sometimes because the LocalFile::loadFromDB DB select statement sometimes takes over 50ms, but it looks like our indices in the image table are fine.

Thanks, Travis

Show replies by date

Chad

25 Jun 25 Jun

2:46 a.m.

On Tue, Jun 24, 2008 at 4:44 PM, Travis (wikiHow) travis@wikihow.com wrote:

...

Hi,

I'm still working on upgrading us to 1.12 and after upgrading the extensions, I'm noticing that the 1.12 upgrade still takes a lot longer to render a page. For ( $elapsed = $now - $wgRequestTime;) for 1.9, I usually see on average about 0.15 seconds to serve a page, while 1.12 I'm seeing more like 0.50 seconds, with both installations running on the same server and connected to the same DB server, etc.

Here are some of the functions I'm seeing take awhile in the profiler.

462.884 MediaWiki::initialize 395.097 MediaWiki::performAction 328.284 Article::view 233.001 Parser::parse 232.899 Parser::parse-Article::outputWikiText 134.212 Parser::internalParse 103.388 Parser::replaceVariables 252.385 MediaWiki::finalCleanup 251.857 OutputPage::output 251.402 Output-skin 251.332 SkinTemplate::outputPage 209.752 SkinTemplate::outputPage-execute 855.008 -total

The full profile is here:

http://69.20.102.10/x/profile_deep.txt

Any ideas? I've tried drilling down into finding out why some of these are taking a long time. replaceInternalLinks seems to take a long time sometimes because the LocalFile::loadFromDB DB select statement sometimes takes over 50ms, but it looks like our indices in the image table are fine.

mysql> check table image; +------------------+-------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------------+-------+----------+----------+ | wikidb_112.image | check | status | OK | +------------------+-------+----------+----------+ 1 row in set (1.73 sec)

Thanks, Travis _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

A fair bit of setup time could be saved if Kaltura lazy-loaded their extension messages (registering them with $wgExtensionMessagesFiles) and then loading them on-demand with wfLoadExtensionMessages(). Every hit to the website is forcing Kaltura to load their messages via the KalturaInstall class' constructor. Their code isn't the easiest to follow, so I can't really suggest a fix here (plus, they don't host it in SVN, it's over on Sourceforge).

-Chad

Travis (wikiHow)

2:57 a.m.

Yep,

I've been working with Kaltura to improve this, hopefully we can fix that up this week.

Removing the kaltura extension though doesn't make any significant improvements, pages still take on average 0.5 seconds to generate. I'm still seeing a lot of time being spent in those other areas like Parser::parse, outputWikiText, etc.

If anyone has any suggestions, it'd be appreciated!

Travis

On Tue, Jun 24, 2008 at 8:46 PM, Chad innocentkiller@gmail.com wrote:

...

On Tue, Jun 24, 2008 at 4:44 PM, Travis (wikiHow) travis@wikihow.com wrote:

...
Hi,

I'm still working on upgrading us to 1.12 and after upgrading the extensions, I'm noticing that the 1.12 upgrade still takes a lot longer

to

...
render a page. For ( $elapsed = $now - $wgRequestTime;) for 1.9, I

usually

...
see on average about 0.15 seconds to serve a page, while 1.12 I'm seeing more like 0.50 seconds, with both installations running on the same

server

...
and connected to the same DB server, etc.

Here are some of the functions I'm seeing take awhile in the profiler.

462.884 MediaWiki::initialize 395.097 MediaWiki::performAction 328.284 Article::view 233.001 Parser::parse 232.899

Parser::parse-Article::outputWikiText

...
134.212

Parser::internalParse

...
103.388

Parser::replaceVariables

...
252.385 MediaWiki::finalCleanup 251.857 OutputPage::output 251.402 Output-skin 251.332 SkinTemplate::outputPage 209.752

SkinTemplate::outputPage-execute

...
855.008 -total

The full profile is here:

http://69.20.102.10/x/profile_deep.txt

Any ideas? I've tried drilling down into finding out why some of these

are

...
taking a long time. replaceInternalLinks seems to take a long time

sometimes

...
because the LocalFile::loadFromDB DB select statement sometimes takes

over

...
50ms, but it looks like our indices in the image table are fine.

mysql> check table image; +------------------+-------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------------+-------+----------+----------+ | wikidb_112.image | check | status | OK | +------------------+-------+----------+----------+ 1 row in set (1.73 sec)

Thanks, Travis _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

A fair bit of setup time could be saved if Kaltura lazy-loaded their extension messages (registering them with $wgExtensionMessagesFiles) and then loading them on-demand with wfLoadExtensionMessages(). Every hit to the website is forcing Kaltura to load their messages via the KalturaInstall class' constructor. Their code isn't the easiest to follow, so I can't really suggest a fix here (plus, they don't host it in SVN, it's over on Sourceforge).

-Chad

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Simetrical

3:46 p.m.

On Tue, Jun 24, 2008 at 8:57 PM, Travis (wikiHow) travis@wikihow.com wrote:

...

Removing the kaltura extension though doesn't make any significant improvements, pages still take on average 0.5 seconds to generate. I'm still seeing a lot of time being spent in those other areas like Parser::parse, outputWikiText, etc.

If anyone has any suggestions, it'd be appreciated!

Tim Starling would be the most knowledgeable, I guess. The major change in the 1.12 parser is the addition of the new preprocessor, and that specifically revamps replaceVariables() et al. However, on Wikimedia it showed a considerable *decrease* in the time taken to render pages. Perhaps there's a configuration difference, or it might be something else entirely that's causing it. You could try switching to Parser_OldPP, which should work correctly in 1.12, to see if that makes a difference. However, Parser_OldPP appears to no longer be maintained to any significant extent, so it probably isn't advisable to use it indefinitely, at least not in 1.13. The ParserDiffTest extension can be used to systematically compare the performance (and output) of the old and new parsers.

Travis (wikiHow)

26 Jun 26 Jun

6:02 p.m.

Great, I'll check in with Tim.

I did some more digging and something odd came up. I loaded the same set of 100 pages in 1.9 and 1.12 with profiling enabled, and in the profiling data, on average in 1.12 Database::doQuery took up 42.48% of the total time to process the page view, while in 1.9 it was more like 11.56%. This seems like a big jump. Is this expected for the new version?

1.12 seemed also to have more DB calls per page view, I put in some debugging info and 1.12 made about 72 calls on average while 1.9 made 60. I don't think this accounts for the big percentage jump though in time spent in Database::doQuery.

I repaired all of the tables in the 1.12 database and looked through the tables and they appear to have the proper indices set. To create the 1.12 db, I just dumped the 1.9 db, imported it into a new db, and ran the config/index.php. This page did time out occasionally on longer tasks, like setting rev_len in revision, so I'm wondering if some key element was skipped when the db was upgraded? It does look like rev_len is NULL for most of the entries, maybe there's something else during the upgrade that was missed?

I don't know if this is a valid test, but I captured about 100 queries that 1.9 made, and ran them against both the 1.12 db an the 1.9 db:

# time mysql wikidb_19 < /tmp/queries.sql > /dev/null real 0m0.055s

# time mysql wikidb_112 < /tmp/queries.sql > /dev/null real 0m0.092s

It seemed to take about twice as long on 1.12, so maybe this is an environmental issue?

Travis

On Wed, Jun 25, 2008 at 9:46 AM, Simetrical <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com> wrote:

...

On Tue, Jun 24, 2008 at 8:57 PM, Travis (wikiHow) travis@wikihow.com wrote:

...
Removing the kaltura extension though doesn't make any significant improvements, pages still take on average 0.5 seconds to generate. I'm

still

...
seeing a lot of time being spent in those other areas like Parser::parse, outputWikiText, etc.

If anyone has any suggestions, it'd be appreciated!

Tim Starling would be the most knowledgeable, I guess. The major change in the 1.12 parser is the addition of the new preprocessor, and that specifically revamps replaceVariables() et al. However, on Wikimedia it showed a considerable *decrease* in the time taken to render pages. Perhaps there's a configuration difference, or it might be something else entirely that's causing it. You could try switching to Parser_OldPP, which should work correctly in 1.12, to see if that makes a difference. However, Parser_OldPP appears to no longer be maintained to any significant extent, so it probably isn't advisable to use it indefinitely, at least not in 1.13. The ParserDiffTest extension can be used to systematically compare the performance (and output) of the old and new parsers.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Simetrical

8:07 p.m.

On Thu, Jun 26, 2008 at 12:02 PM, Travis (wikiHow) travis@wikihow.com wrote:

...

I did some more digging and something odd came up. I loaded the same set of 100 pages in 1.9 and 1.12 with profiling enabled, and in the profiling data, on average in 1.12 Database::doQuery took up 42.48% of the total time to process the page view, while in 1.9 it was more like 11.56%. This seems like a big jump. Is this expected for the new version?

I can't see why it would be. Are you running the two on the same machine? If so, is one of them being run against a production database and one against a testing database? If one is being used more than the other, that one is going to have more cache hits and work faster, so your production 1.9 site is of course going to run faster than your testing 1.12 site, even on the same server.

...

1.12 seemed also to have more DB calls per page view, I put in some debugging info and 1.12 made about 72 calls on average while 1.9 made 60.

Well, the current figure on Wikipedia seems to be under 10 queries per view:

http://noc.wikimedia.org/cgi-bin/ng/report.py?db=enwiki

So maybe you should be looking into what extra queries are being run. There are some features that can cause nasty query spam if things aren't configured right (like interwikis).

...

I repaired all of the tables in the 1.12 database

Are you using MyISAM? If so, why? MediaWiki isn't really designed for it at all; it uses InnoDB-specific features like primary key clustering heavily.

...

and looked through the tables and they appear to have the proper indices set. To create the 1.12 db, I just dumped the 1.9 db, imported it into a new db, and ran the config/index.php.

Why don't you use maintenance/update.php from a shell? It's almost certain to be more reliable.

...

I don't know if this is a valid test, but I captured about 100 queries that 1.9 made, and ran them against both the 1.12 db an the 1.9 db:

# time mysql wikidb_19 < /tmp/queries.sql > /dev/null real 0m0.055s

# time mysql wikidb_112 < /tmp/queries.sql > /dev/null real 0m0.092s

It seemed to take about twice as long on 1.12, so maybe this is an environmental issue?

Not sure, that difference might be noise. Is it consistent? Make sure you repeat them both until you get a consistent figure, preferably on a server doing nothing else; otherwise you can get issues with caches getting messed up. Also, again, make sure that the databases are *both* cold, not production databases.

Travis (wikiHow)

9:45 p.m.

Hey,

Thanks for the reply, it's appreciated.

The 1.9 and 1.12 installs are running on the same machine, and are not receiving any traffic other than my testing. 1.9 is pointing to a production DB instance, both are pointing to the same MySQL server, but I changed the wgProxyKey for both installs so they don't hit the live memcached, could 1.9 be getting some cached data from the production DB to make it perform better?

I'll take a look into the extra queries, they could be coming from our customizations and extensions.

This query occassionally takes a long time to execute, not sure why:

SELECT /* Parser::replaceLinkHolders */ page_id, page_namespace, page_title, page_is_redirect FROM `page` WHERE (page_namespace=0 AND page_title IN('Study-Better', 'Have-Fun-While-Studying', 'Socialize,-Be-Funny-and-Make-Friends', 'Create-Good-Study-Habits-for-Exams', 'Get-Good-Grades', 'Ace-Your-Next-Test', 'Get-Straight-A's', 'Upgrade-Your-Life-As-a-Teenager'))

...

...
I repaired all of the tables in the 1.12 database

Are you using MyISAM? If so, why? MediaWiki isn't really designed for it at all; it uses InnoDB-specific features like primary key clustering heavily.

No idea! Is it possible because we originally created this DB with a MW 1.3 install, that not all the tables have been converted? It looks like our `archive` table is MyISAM, but in the new MW installs, it's InnoDB. Should we switch them all over to be Inno?

Why don't you use maintenance/update.php from a shell? It's almost

...

certain to be more reliable.

Cool, I didn't know this existed. I'll use this next time.

Travis (wikiHow)

10:20 p.m.

Do you think it might be an improvement if both Title::getParentCategories and Title::getParentCategoryTree could store their results in a local variable? They currently fetch the resuts DB each time regardless of whether they've been called already or not. I'm not sure how much this is called in a normal installation, but we call it a few times to figure out the category structure of the article.

Storing the results seems to cut down our # of db calls by about 12 per page view.

Simetrical

10:33 p.m.

On Thu, Jun 26, 2008 at 4:20 PM, Travis (wikiHow) travis@wikihow.com wrote:

...

Do you think it might be an improvement if both Title::getParentCategories and Title::getParentCategoryTree could store their results in a local variable? They currently fetch the resuts DB each time regardless of whether they've been called already or not. I'm not sure how much this is called in a normal installation, but we call it a few times to figure out the category structure of the article.

Storing the results seems to cut down our # of db calls by about 12 per page view.

It might be a reasonable idea. I would suggest only caching the result object, not the array, because arrays eat up a lot more memory, according to Tim. Something like this (diff won't apply cleanly because this is trunk):

- # NEW SQL - $sql = "SELECT * FROM $categorylinks" - ." WHERE cl_from='$titlekey'" - ." AND cl_from <> '0'" - ." ORDER BY cl_sortkey"; + if( !isset( $this->mParentCategories ) ) { + # NEW SQL + $sql = "SELECT * FROM $categorylinks" + ." WHERE cl_from='$titlekey'" + ." AND cl_from <> '0'" + ." ORDER BY cl_sortkey";

- $res = $dbr->query( $sql ); + $this->mParentCategories = $dbr->query( $sql ); + } + $res = $this->mParentCategories;

However, if this is being caused by a specific extension, it might be a better idea to look at that extension for higher-level optimization. For instance, my diff here only catches repeated queries from the same Title object, which will not be all repeated queries. On the other hand, if you kept all the results in a static array, you're wasting memory, since they can never be garbage-collected, and this might be quite a lot of memory wasted if this method is called for a few hundred different pages in one request. A cache at the caller level is probably a better idea, since it can be comprehensive but at the same time fall out of scope and be garbage-collected when it's no longer needed.

If you find it's being caused by core code or an extension in SVN, of course, we'd like to hear about it.

Simetrical

10:22 p.m.

On Thu, Jun 26, 2008 at 3:45 PM, Travis (wikiHow) travis@wikihow.com wrote:

...

The 1.9 and 1.12 installs are running on the same machine, and are not receiving any traffic other than my testing. 1.9 is pointing to a production DB instance, both are pointing to the same MySQL server, but I changed the wgProxyKey for both installs so they don't hit the live memcached, could 1.9 be getting some cached data from the production DB to make it perform better?

Yes, definitely! It will use the OS cache for the file blocks and the MyISAM key buffer for the key blocks. It will be working with hot data, so it will certainly be much faster. A typical view on your 1.9 instance might require no actual disk reads at all, while your 1.12 install could require many. A key buffer hit wouldn't even require a system call. This could easily account for a fourfold difference or more -- in principle, even a thousand-fold difference, although you aren't seeing that here (production cache might not be totally hot, dev cache might not be totally cold).

You could try hitting the 1.12 install with artificial traffic to populate the caches, but it's still not necessarily going to behave quite the same. For performance benchmarking you really want exactly the same load patterns on everything you're testing. How far you want to go in achieving this is up to you. At a minimum I'd copy the 1.9 DB twice to two separate dev databases, and upgrade one of them. You could then do benchmarking with ab or a similar tool.

...

I'll take a look into the extra queries, they could be coming from our customizations and extensions.

Two significant things that come to mind in the core software config: one, if you're running a large number of interwikis, you need special configuration for that to avoid a ton of DB traffic. Two, you should set $wgJobRunRate = 0 and have the job queue run using maintenance/runJobs.php in cron, not on each page view.

...

This query occassionally takes a long time to execute, not sure why:

SELECT /* Parser::replaceLinkHolders */ page_id, page_namespace, page_title, page_is_redirect FROM `page` WHERE (page_namespace=0 AND page_title IN('Study-Better', 'Have-Fun-While-Studying', 'Socialize,-Be-Funny-and-Make-Friends', 'Create-Good-Study-Habits-for-Exams', 'Get-Good-Grades', 'Ace-Your-Next-Test', 'Get-Straight-A's', 'Upgrade-Your-Life-As-a-Teenager'))

It's almost certainly a bogus statistic. It could just reflect a long time waiting for a lock or something.

...

No idea! Is it possible because we originally created this DB with a MW 1.3 install, that not all the tables have been converted? It looks like our `archive` table is MyISAM, but in the new MW installs, it's InnoDB. Should we switch them all over to be Inno?

I'd imagine that would be a good idea, yes. Make sure you reconfigure MySQL, though! Standard settings are totally unacceptable for heavy InnoDB use. You'll want to set key_buffer_size smaller and increase innodb_buffer_pool_size to most of your database server's available memory (if no serious MyISAM apps are running on the DB server -- and anyway temporary tables are always MyISAM, so don't set key_buffer_size too small!). This is a good article to read:

http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimizati...

...

Cool, I didn't know this existed. I'll use this next time.

It's covered in the manual:

http://www.mediawiki.org/wiki/Manual:Upgrading_to_1.12

Travis (wikiHow)

10:51 p.m.

Cool, thanks for all of the info, it's been very helpful.

You could try hitting the 1.12 install with artificial traffic to

...

populate the caches, but it's still not necessarily going to behave quite the same. For performance benchmarking you really want exactly the same load patterns on everything you're testing. How far you want to go in achieving this is up to you. At a minimum I'd copy the 1.9 DB twice to two separate dev databases, and upgrade one of them. You could then do benchmarking with ab or a similar tool.

I'd like to be sure it's going to stand up to traffic before releasing it to production. We've had upgrades before that had to be aborted because the load on the servers spiked as soon as they were upgraded. :) Making multiple copies of an 10GB DB is a bit of a pain, but I'll give it a shot! :)

...

It's almost certainly a bogus statistic. It could just reflect a long time waiting for a lock or something.

Cool. Would this be the same issue likely if I'm occasionally seeing LoadBalancer::getConnection take over 4 seconds in total on 1 page view?

For replaceLinkHolders function, should $res and $varRes eventually be passed to a freeResult call? They don't seem to be, not sure if this is intentional, and it seems to be the same way in 1.9.

It's mostly our skin calling getParentCategoryTree on wgTitle, one call to set the breadcrumbs at the top of the page, another to set any meta tags associated with the category, another to determine the top level category for some UI features. Depending on the category structure, iterating over the tree can make as many as 10 calls to the DB, so we'll likely start caching that somewhere. I guess the Monobook skin doesn't show the category tree or breadcrumbs, so this isn't as much of an issue for a typical install.

Thanks again!

Simetrical

27 Jun 27 Jun

midnight

On Thu, Jun 26, 2008 at 4:51 PM, Travis (wikiHow) travis@wikihow.com wrote:

...

...
It's almost certainly a bogus statistic. It could just reflect a long time waiting for a lock or something.

Well, I should clarify that spending a long time waiting for a lock doesn't mean there's no problem, it just means that the problem isn't with the particular query. Switching to InnoDB might help with that a lot, if it is a locking issue, since reads don't normally do any locking.

...

Cool. Would this be the same issue likely if I'm occasionally seeing LoadBalancer::getConnection take over 4 seconds in total on 1 page view?

I can't claim to have a very good idea of what that method does, in practice. It looks quite complicated.

...

For replaceLinkHolders function, should $res and $varRes eventually be passed to a freeResult call? They don't seem to be, not sure if this is intentional, and it seems to be the same way in 1.9.

Tim Starling has pointed out on this list that it makes no difference. Contrary to the manual, you don't seem to need to manually free results, at least not for memory-usage purposes. This code uses steadily more memory until it OOMs:

$enormous = array(); while( true ) { $enormous []= $dbr->select( 'user', '*', array( 'user_id' => 1 ) ); }

But this uses practically no memory:

while( true ) { $result = $dbr->select( 'user', '*', array( 'user_id' => 1 ) ); }

...

It's mostly our skin calling getParentCategoryTree on wgTitle, one call to set the breadcrumbs at the top of the page, another to set any meta tags associated with the category, another to determine the top level category for some UI features. Depending on the category structure, iterating over the tree can make as many as 10 calls to the DB, so we'll likely start caching that somewhere. I guess the Monobook skin doesn't show the category tree or breadcrumbs, so this isn't as much of an issue for a typical install.

You might want to cache it in your skin, then. Although if it's just $wgTitle, then per-object Title-level caching should be equally effective and not an overly large memory hog.

Travis (wikiHow)

5:26 p.m.

Cool, thanks again.

Travis

Brion Vibber

28 Jun 28 Jun

8:17 p.m.

Travis (wikiHow) wrote:

...

Hey,

Thanks for the reply, it's appreciated.

The 1.9 and 1.12 installs are running on the same machine, and are not receiving any traffic other than my testing. 1.9 is pointing to a production DB instance, both are pointing to the same MySQL server, but I changed the wgProxyKey for both installs so they don't hit the live memcached, could 1.9 be getting some cached data from the production DB to make it perform better?

$wgProxyKey has nothing to do with memcached; the only thing it's used for is the remote HTTP proxy scanning feature which is disabled by default and probably doesn't work right.

-- brion

5998

Age (days ago)

6002

Last active (days ago)

wikitech-l@lists.wikimedia.org

13 comments

4 participants

tags (0)

participants (4)

Brion Vibber
Chad
Simetrical
Travis (wikiHow)