I ran some aggregate live profiling on en, and noticed that wfMsg was responsible for 70% of request time. Each wfMsg() call was leading to a database query. Investigating, I noticed that the memcached key for messages was set to "error" -- this indicates that the script attempted to load messages from the database and save them to memcached, but setting the large value failed. This error is typical of the old slabs reassignment problem. However that wasn't the problem in this case, none of the memcached servers were at their memory quota.
In 1.2, only the internal messages from the MediaWiki namespace were cached. In 1.3, custom messages are moved to the Template namespace and the whole namespace is cached. But since I only just started running the move script, we've been attempting to cache the whole namespace, including all the templates, for the last week. Presumably the size of the namespace was larger than memcached's value size limit, so it failed.
To restore site performance adversely affected by loading the messages, I temporarily switched off $wgUseDatabaseMessages. Then I ran the move script on en. It is still running as I type. When it is finished, I will clear the error value from memcached and re-enable $wgUseDatabaseMessages. The first web request after that should then cache the namespace successfully.
-- Tim Starling
Tim Starling wrote:
In 1.2, only the internal messages from the MediaWiki namespace were cached. In 1.3, custom messages are moved to the Template namespace and the whole namespace is cached. [...] The first web request after that should then cache the namespace successfully.
Why exactly are you trying to cache the entire namespace on a single Web request? Shouldn't you cache only those that the Web request actually requested? In other words, it should not have tried to cache all the templates in the MediaWiki namespace, but only those that were actually used by the Web requests. I would have thought this is the purpose of a cache: to store that which is commonly or recently accessed.
Timwi
Timwi wrote:
Why exactly are you trying to cache the entire namespace on a single Web request? Shouldn't you cache only those that the Web request actually requested? In other words, it should not have tried to cache all the templates in the MediaWiki namespace, but only those that were actually used by the Web requests. I would have thought this is the purpose of a cache: to store that which is commonly or recently accessed.
Then you'd have to make dozens of separate cache requests on every page view (increasing time and network overhead) or know ahead of time what you'll need to fetch (we're not really structured that way). Since the complete set of messages is relatively small (64-100k uncompressed) it's easier to grab a single stored array than to pick and choose which ones you think you'll be needing.
There isn't supposed to be anything in the MediaWiki namespace except the user interface messages.
-- brion vibber (brion @ pobox.com)
Tim Starling wrote:
To restore site performance adversely affected by loading the messages, I temporarily switched off $wgUseDatabaseMessages. Then I ran the move script on en. It is still running as I type. When it is finished, I will clear the error value from memcached and re-enable $wgUseDatabaseMessages. The first web request after that should then cache the namespace successfully.
Just an update. After fixing this, script execution time was still fairly slow, around 4.2 seconds. After suppressing caching of redirects left behind in the MediaWiki namespace and re-enabling the temporarily disabled linkscc and parser caches, average execution time dropped to 307ms. This was fixed over a period of about 1.5 hours, so traffic may have changed slightly.
Enabling $wgLinkCacheMemcached should save us another 9ms or so. Does anyone know why this hasn't been done?
Most recent profiling data is at: http://meta.wikipedia.org/wiki/Profiling/Live_aggregate_20040604
-- Tim Starling
wikitech-l@lists.wikimedia.org