Galician and Asturian are hardly new. Also, I'm not sure how Inuktitut
or Kurdish serve usability purposes or are duplicates...
Mark
On 31/12/2007, Siebrand Mazeland <s.mazeland(a)xs4all.nl> wrote:
I have not seen a comprehensive overview of MediaWiki
localisation discussed on the lists I am posting this message to, so I thought I might
give it a try. All statistics are based on MediaWiki 1.12 alpha, SVN version r29106.
==Introduction==
*Localisation or L10n - the process of adapting the software to be as familiar as
possible to a specific locale (in scope)
*Internationalisation or i18n - the process of ensuring that an application is capable of
adapting to local requirements (out of scope)
MediaWiki has a user interface (UI) definition for 319 languages. Of those languages at
least 17 language codes are duplicates and/or serve a purpose for usability[1]. Reporting
on them, however, is not relevant. So MediaWiki in its current state supports 302
languages. To be able to generate statistics on localisation, a MessagesXx.php file should
be present in languages/messages. There currently are 262 such files, of which 16 are
redirects from the duplicates/usability group[2]. So MediaWiki has an active in-product
localisation for 236 languages. 66 languages have an interface, but simply fall back to
English.
The MediaWiki core product recognises several collections of localisable content (three
of which are defined in messageTypes.inc):
* 'normal' messages that can be localised (1726)
* optional messages that can be localised, which usually only happens for languages not
using a Latin script (161)
* ignored messages that should not be localised (100)
* namespace names and namespace aliases (17)
* skin names (7)
* magic words (120)
* special page names (76)
* other (directionality, date formats, separators, book store lists, link trail, and
others)
Localisation of MediaWiki revolves around all of the above. Reporting is done on the
normal messages only.
MediaWiki is more than just the core product. On
http://www.mediawiki.org/wiki/Category:All_extensions some 750 extensions have some kind
of documentation. This analysis will scope only to the code currently present in
svn.wikimedia.org/svnroot/mediawiki/trunk. The source code repository contains give or
take 230 extensions. Of those 230 extensions, about 140 contain messages that can be
visible in the UI in some use case (debugging excluded). Out of those 140, about 10
extensions have an exotic implementation for localisation localisation support at all
(just English text in the code). 10 extensions appear to be outdated. I have seen about 5
different 'standard' implementations of i18n in extensions. Since MediaWiki 1.11
there is wfLoadExtensionMessages. Not that many extensions use this yet for message
handling. If you can help add more standard i18n support for extensions (an overview can
be found at
http://translatewiki.net/wiki/User:Siebrand/tobeadded) or help in
standardising L10n for extensions, please do not hesitate.
==MediaWiki localisation in practice==
Localisation of MediaWiki is currently done in the following ways I am aware of:
* in local wikis: Sysops on local wikis shape and translate messages to fit their needs.
This is being done in wikis that are part of Wikimedia, Wikia, Wikitravel, corporate
wikis, etc. This type of localisation has the fewest benefits for the core product and
extensions because it happens completely out of the scope of svn committers. I have heard
Wikia supports languages that are not supported in the svn version. I would like to get
some help in identifying and contacting these communities to try and get their
localisations in the core product. Together with SPQRobin, I am trying to get what has
been localised in local Wikipedias into the core product and recruit users that worked on
the localisation to work on a more centralised way of localisation (see Betawiki)
* through bugzilla/svn: A user of MediaWiki submits patches for core messages and/or
extensions. These users are mostly part of a wiki community that is part of Wikimedia.
These are usually taken care of by committers raymond, rotemliss, and sometimes others).
Some users maintain a language directly on SVN. At the moment, 10-15 languages are
maintained this way: Danish, German, Persian, Hebrew, Indonesian, Kazach (3 scripts),
Chinese (3 variants), and some more less frequently.
* through Betawiki: Betawiki was founded in mid 2005 by Niklas Laxström. In the years to
follow, Betawiki has grown to be a MediaWiki localisation community with over 200 users
that has contributed to the localisation of 120 languages each month in the past few
months. Users that are only familiar with MediaWiki as a tool can localise almost every
aspect of MediaWiki (except for the group 'other' mentioned earlier) in a wiki
interface. The work of the translators is regularly committed to svn by nikerabbit, and
myself. Betawiki also offers a .po export that enables users to use more advanced
translation tools to make their translation. This option was added recently and no
translations in this format have been sumitted yet. Betawiki also supports translation of
122 extensions, aiming to support everything that can be supported.
==MediaWiki localisation statistics==
MediaWiki localisation statistics have been around since June 2005 at
http://www.mediawiki.org/wiki/Localisation_statistics[3]. Traditionally reports have
focused on the complete set of core messages. Recently a small study was done after usage
of messages, which resulted in a set of almost 500 'most often used messages in
MediaWiki', based on usage of messages on the cluster of Wikimedia
(
http://translatewiki.net/wiki/Most_often_used_messages_in_MediaWiki).
Up to recently there were no statistics available on the localisation of extensions.
Through groupStatistics.php in the extension Translate, these statistics can now be
created. Aside from reporting on 'most often used MediaWiki messages',
'MediaWiki messages', and 'all extension messages supported by extension
Translate' (short: extension messages). Additionally, a meta extension group of 34
extensions used in the projects of Wikimedia has been created (short: Wikimedia messages).
A regularly updated table of these statistics can be found at
http://translatewiki.net/wiki/Translating:Group_statistics.
Some (arbitrary) milestones have been set for the four above mentioned collections of
messages. For the usability of MediaWiki in a particular language, the group
'core-mostused' is the most important. A language must qualify for MediaWiki to
have minimal support for that language. Reaching the milestones for the first two groups
is something the Wikimedia language committee considers to use as a requirement for new
Wikimedia wikis:
* core-mostused (496 messages): 98%
* wikimedia extensions (354 messages): 90%
* core (1726 messages): 90%
* extensions (1785 messages): 65%
Currently the following numbers of languages have passed the above milestones:
* core-mostused: 47 (15,5% of supported languages)
* wikimedia extensions: 10 (3,3% of supported languages)
* core: 49 (16,2% of supported languages)
* extensions: 7 (2,3% of supported languages)
==Conclusion==
So... Are we doing well on localisation or do we suck? My personal opinion is that we do
something in between. Observing that there are some 250 Wikipedias that all use the
Wikimedia Commons media repository, and that only 47 languages have a minimal
localisation, we could do better. With Single User Login around the corner (isn't it),
we must do better. On the other hand, new language projects within Wikimedia all have
excellent localisation of the core product. These languages include Asturian, Bikol
Central, Lower Sorbian, Extremaduran, and Galician. But where is Hindi, for example, with
currently only 7% of core messages translated?
With the Wikimedia Foundation aiming to put MediaWiki to good use in developing countries
and products like NGO-in-a-box that include MediaWiki, the potential of MediaWiki as a
tool in creating and preserving knowledge in the languages of the world is huge. We have
to tap into that potential and *you* (yes, I am glad you read this far and are now reading
my appeal) can help. If you know people that are proficient in a language and like
contributing to localisation, please point them in the right direction. If you know of
organisations that can help localising MediaWiki: please approach them and ask them to
help.
We have all the tools now to successfully localise MediaWiki into any of the 7000 or so
languages that have been classified in ISO 639-3. We only need one person per language to
make it happen. Reaching the first two milestones (core-mostused and wikimedia extensions)
takes about 16 hours of work. Using Betawiki or the .po, little to no technical knowledge
is required.
This was the pitch. How about we aim to at least double the numbers by the end of 2008
to:
* core-mostused: 120
* wikimedia extensions: 50
* core: 90
* extensions: 20
I would like to wish everyone involved in any aspect of MediaWiki a wonderful 2008.
Cheers!
Siebrand Mazeland
[1]
als,crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[2] crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[3] older locations are
http://www.mediawiki.org/wiki/Localisation_statistics/stats and
http://meta.wikimedia.org/wiki/Localization_statistics
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l