Here is my analysis of MediaWiki's I18N system:
== Structure ==
First, you have a Language object. This object contains all the
localisable message strings, as well as other important
language-specific settings and custom behavior (uppercasing,
lowercasing, printing dates, formatting numbers, etc.)
The object is constructed from two sources: subclassed versions of
itself (classes) and Message files (messages).
== General use ==
You load a language object by calling the Language::factory() function.
This function the class file for the object (taking in account fallback
languages by using the fallback langauge's object but overloading the
language key) and returns that object. Nothing else happens.
When a message/etc is requested, a lazy load initializor is called. Now
the real work starts. We're first going to take the scenario that the
language is not cached. The system loads the Messages file by:
require( $filename );
$cache = compact( self::$mLocalisationKeys );
...where self::$mLocalisationKeys is the name of variables that could be
used in the localization file. This lets you use things like:
$fallback = false;
$rtl = false;
...and easily siphon them into arrays.
Then, we load the $fallback language (if not set, English) to fill in
the gaps in the messages. There is specialized behavior for certain
keys, as they can be mergeable maps, lists or alias lists (not sure what
the last one is).
== Caching ==
MediaWiki has lots of caching mechanisms built in, which make the code
somewhat more difficult to understand. Before doing any loading,
MediaWiki will check the following places to see if we can be lazy:
1. $mLocalisationCache[$code] - just a variable where it may have been
stashed
2. serialized/$code.ser - compiled serialized language file
3. Memcached version of file (with expiration checking)
Expiration checking consists of by ensuring all dependencies have
filemtime that match the ones bundled with the cached copy. Similar
checking could be implemented for serialized versions, as it seems that
they are not updated until manually recompiled.
== Behavior ==
Things that are localizable:
- Weekdays (and abbrev)
- Months (and abbrev)
- Bookstores
- Skin names
- Math names
- Date preferences
- Date format
- Default date format
- Date preference migration map
- Default user option overrides
- Language names
- Timezones
- Character encoding conversion via iconv
- UpperLowerCase first (needs casemaps for some)
- UpperLowerCase
- Uppercase words
- Uppercase word breaks
- Case folding
- Strip punctuation for MySQL search
- Get first character
- Alternate encoding
- Recoding for edit (and then recode input)
- RTL
- Direction mark character depending on RTL
- Arrow depending on RTL
- Languages where italics cannot be used
- Number formatting (commafy, transform digits, transform separators)
- Truncate (multibyte)
- Grammar conversions for inflected languages
- Plural transformations
- Formatting expiry times
- Segmenting for diffs (Chinese)
- Convert to variants of language
- Language specific user preference options
- Link trails [[foo]]bar
- Language code (RFC 3066)
Neat functionality:
- I18N sprintfDate
- Roman numeral formatting