The current localisation system has a number of undesirable properties:
* Start from a cold cache is extremely slow, taking from 20 seconds to several minutes. * The database is preloaded with hundreds of default messages, causing: * Slow installation, to the point where web installation is entirely impossible on some resource-limited shared web hosts without commenting out the message cache section * Excessive disk usage and slow backups on sites with large numbers of near-empty wikis * The message cache can exceed the 1MB limit of MemCached, causing total failure * The performance of the message cache degrades when some of the keys are large
I spent a fair bit of time pondering how to fix this, but I think it was Rotem who finally suggested the obvious solution: don't have pages for default messages.
The only reason for preloading the MediaWiki namespace was to provide admnis with model text upon which they could base their translations. This justification has long since disappeared, since action=edit, action=view and Special:Allmessages are now all capable of drawing default message text from the message files if the articles do not exist.
So here's what I've done in my working copy, soon to be committed: * Removed InitialiseMessages.inc and rebuildMessages.php * During upgrade, delete all pages in the MediaWiki namespace which were last modified by "MediaWiki default". * Reoptimised the message cache for the sparse MediaWiki namespace.
The main message cache (i.e. the $wgDBname:messages key) will now be a faithful representation of the contents of the MediaWiki namespace, instead of (as it previously was) a representation of the contents of all messages. If a page does not exist, it will not have a message cache key.
To solve the performance problems of having a small number of large items, any page which is larger than some threshold (10KB by default) will only have a placeholder stored in the main message cache, instead of the complete page text. The full contents of these items are stored separately in the cache.
-- Tim Starling
On 05/01/07, Tim Starling tstarling@wikimedia.org wrote:
So here's what I've done in my working copy, soon to be committed:
- Removed InitialiseMessages.inc and rebuildMessages.php
- During upgrade, delete all pages in the MediaWiki namespace which were
last modified by "MediaWiki default".
- Reoptimised the message cache for the sparse MediaWiki namespace.
The main message cache (i.e. the $wgDBname:messages key) will now be a faithful representation of the contents of the MediaWiki namespace, instead of (as it previously was) a representation of the contents of all messages. If a page does not exist, it will not have a message cache key.
To solve the performance problems of having a small number of large items, any page which is larger than some threshold (10KB by default) will only have a placeholder stored in the main message cache, instead of the complete page text. The full contents of these items are stored separately in the cache.
Sounds good, but...
...will this in any way affect the means by which extensions have to add messages to the Message Cache? Will the existing interfaces still work, or do we now have to update all the code, 'cause I'm concerned about people breaking backwards compatibility again.
Rob Church
Rob Church wrote:
On 05/01/07, Tim Starling tstarling@wikimedia.org wrote:
So here's what I've done in my working copy, soon to be committed:
- Removed InitialiseMessages.inc and rebuildMessages.php
- During upgrade, delete all pages in the MediaWiki namespace which were
last modified by "MediaWiki default".
- Reoptimised the message cache for the sparse MediaWiki namespace.
The main message cache (i.e. the $wgDBname:messages key) will now be a faithful representation of the contents of the MediaWiki namespace, instead of (as it previously was) a representation of the contents of all messages. If a page does not exist, it will not have a message cache key.
To solve the performance problems of having a small number of large items, any page which is larger than some threshold (10KB by default) will only have a placeholder stored in the main message cache, instead of the complete page text. The full contents of these items are stored separately in the cache.
Sounds good, but...
...will this in any way affect the means by which extensions have to add messages to the Message Cache? Will the existing interfaces still work, or do we now have to update all the code, 'cause I'm concerned about people breaking backwards compatibility again.
The only interface change is that I've renamed getFromCache() to the more accurate getMsgFromNamespace(), but that function was implicitly private. Anything that accesses $wgMessageCache->mCache directly will be broken. But the usual public interfaces such as addMessages() and get() are preserved.
I'm usually pretty careful these days to maintain interface compatibility, but it was no challenge this time around, since the code changes are fairly minimal.
-- Tim Starling
On 05/01/07, Tim Starling tstarling@wikimedia.org wrote:
The only interface change is that I've renamed getFromCache() to the more accurate getMsgFromNamespace(), but that function was implicitly private. Anything that accesses $wgMessageCache->mCache directly will be broken. But the usual public interfaces such as addMessages() and get() are preserved.
I'm usually pretty careful these days to maintain interface compatibility, but it was no challenge this time around, since the code changes are fairly minimal.
Excellent, thanks very much.
Rob Church
On 1/5/07, Tim Starling tstarling@wikimedia.org wrote:
- The message cache can exceed the 1MB limit of MemCached, causing total
failure
For what it's worth, this can be adjusted with a compile-time constant: http://lists.danga.com/pipermail/memcached/2006-January/001879.html
(Of course, you have many other good points beyond this...)
"Tim Starling" tstarling@wikimedia.org wrote in message news:enlnmo$69l$1@sea.gmane.org...
- During upgrade, delete all pages in the MediaWiki namespace which were
last modified by "MediaWiki default".
Perhaps a second check if it wasn't, to see if the contents are identical to the expected contents. I'm sure there are lots of cases where a message was changed and subsequently reverted, or where spelling errors were fixed locally before upgrading to a version where they were fixed by default.
- Mark Clements (HappyDog)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Starling wrote:
So here's what I've done in my working copy, soon to be committed:
- Removed InitialiseMessages.inc and rebuildMessages.php
- During upgrade, delete all pages in the MediaWiki namespace which were
last modified by "MediaWiki default".
- Reoptimised the message cache for the sparse MediaWiki namespace.
I've gone ahead and taken this live; the batch deletions are running in the background.
A couple tweaks:
* The deleteDefaultMessages script now ensures that the 'MediaWiki default' user is set up as a bot, so the flood of deletions is hidden from recent changes.
* It turns out some messages try to transclude other messages. French Wikipedia's MediaWiki:Copyrightwarning for instance trancluded MediaWiki:Copyrightpage to get the default local page name for the copyright information page; also some of the default messages for the Special:Export page in various languages fetch the 'Main Page' name this way for the example text. I've changed the transclusion logic to fetch from the message cache when pulling a {{MediaWiki:}} page that doesn't exist in the database; that should better match the 'expected' behavior from these pages seeming to exist for viewing purposes.
Some wikis also experienced a temporary problem with the '!TOO BIG' message being showed in place of all UI messages.
I think this could have been due to funny updating behavior; several machines which were recently reinstalled didn't have the 'sudo' configuration set up correctly, so parts of the update scripts didn't run correctly. This may have lead to inconsistent behavior, though I'm not sure that's the cause; or it may have just been partial updates where MessageCache had new code but DefaultSettings didn't have the configuration variable yet, so the maximum message size triggered on everything.
- -- brion vibber (brion @ pobox.com)
On 07.01.2007 13:44, Brion Vibber wrote:
I've gone ahead and taken this live; the batch deletions are running in the background.
On en.wikipedia I currently (14:56, 7 January 2007 UTC) see
<main page>
in the sidebar (html entities appearing on the rendered page?). I'm using monobook.
I've looked through recent changes in MediaWiki namespace on en but can't find anything that could explain this.
So could that problem have something to do with this change here?
Apologies for possibly asking stupid things in the wrong place.
On 1/7/07, Ligulem ligulem@pobox.com wrote:
On en.wikipedia I currently (14:56, 7 January 2007 UTC) see
<main page>
in the sidebar (html entities appearing on the rendered page?). I'm using monobook.
For future reference, that's the secret code for "the message 'main_page' is supposed to be here, but it doesn't exist". Seems to be fixed now, anyway.
Ligulem wrote:
On 07.01.2007 13:44, Brion Vibber wrote:
I've gone ahead and taken this live; the batch deletions are running in the background.
On en.wikipedia I currently (14:56, 7 January 2007 UTC) see
<main page>
in the sidebar (html entities appearing on the rendered page?). I'm using monobook.
I've looked through recent changes in MediaWiki namespace on en but can't find anything that could explain this.
So could that problem have something to do with this change here?
Apologies for possibly asking stupid things in the wrong place.
It was reported on #wikimedia-tech at 15:08 UTC and I fixed it in under 5 minutes. That would have been the best place to report it -- the sooner we hear about it, the sooner we can fix it.
-- Tim Starling
wikitech-l@lists.wikimedia.org