[Mediawiki-l] An update on localisation in MediaWiki (2009)

Siebrand Mazeland s.mazeland at xs4all.nl
Sat Jan 2 19:06:14 UTC 2010


On 31 December 2007 and 1 January 2008 I sent an e-mail to which this is a 
follow up[1,2].

First things first, because not everyone reads e-mails completely:
* MediaWiki localisation (that is the translation of English source messages 
to other languages) depends on you! If you speak a language other than 
English, care about your language in MediaWiki and Wikimedia and like 
translating, go to http://translatewiki.net, register a user and start 
contributing translations for MediaWiki and MediaWiki extensions. When your 
localisation is complete, keep coming back regularly to re-complete it and do 
quality control. Thank you in advance for all your contributions and effort.
* The i18n and L10n area of MediaWiki requires continuous efforts. If this 
area of FOSS has your interest: we need your help. Please offer your 
development skills to further MediaWiki's i18n, L10n and translation 
capabilities[3,4].

All statistics are based on MediaWiki 1.16 alpha, SVN version r60527 (31 
December 2009). Comparisons are to MediaWiki 1.14 alpha, SVN version r45277
(1 January 2009).

See http://translatewiki.net/wiki/MediaWiki_2009 for a wiki version of this 
message.

==Introduction==
* Localisation or L10n - the process of adapting the software to be as 
familiar as possible to a specific locale (topic of this message)
* Internationalisation or i18n - the process of ensuring that an application 
is capable of adapting to local requirements (out of scope of this message)

MediaWiki has a user interface definition for 362 languages (up from 348). Of 
those languages at least 39 language codes are duplicates and/or serve a 
purpose for usability[5]. Reporting on them, however, is not relevant. So 
MediaWiki in its current state supports 323 languages (up from 322). MediaWiki 
has 346 core language files (up from 326), of which 27 are redirects from the 
duplicates/usability group or just empty[6]. So MediaWiki has an active 
in-product localisation for 308 languages (up from 299).

The MediaWiki core product has several areas that can be localised:
* regular messages that can and should be localised (2,369 - up 9% from 2,168)
* optional messages that can be localised, which is mostly used for languages 
not using a Latin script (187 - up 8% from 173)
* ignored messages that should not be localised (152 - up 2% from 149)
* namespace names and namespace aliases (17 - no change)
* magic words (142 - up 8% from 132)
* special page names (88 - up 2% from 86)
* other (directionality, date formats, separators, book store lists, link 
trail, and others)

Localisation of MediaWiki revolves around all of the above. Reporting is done 
on the regular messages only.

MediaWiki is more than just the core product. On 
http://www.mediawiki.org/wiki/Category:All_extensions 1500 extensions (up 25% 
from 1200) have some kind of documentation. This analysis only takes the code 
currently present in svn.wikimedia.org/svnroot/mediawiki/trunk into account. 
The source code repository contains give or take 445 extensions (up 25% from 
370). Most extensions in the MediaWiki Subversion repository now use the 
reference implementation for i18n. Currently 8,200 messages for MediaWiki 
extensions can be localised in a consistent way (up 37% from 6,000).

==MediaWiki localisation in practice==
MediaWiki localisation has moved further to a centralised collaborative 
process in translatewiki.net in the past year. Where in 2008 some wikis were 
still translating in their own MediaWiki: namespace, the introduction of the 
LocalisationUpdate extension[7], especially in the Wikimedia Foundation wikis, 
has taken away the last hurdle for local translation against centralised 
translation: instant gratification. Translations that are committed to 
Subversion can be added to wikis without requiring software updates, as often 
as desirable.

Little to no translations are submitted through the Bugzilla ticketing system 
or directly by SVN committers. Exceptions are the localisations of Hebrew, 
Cantonese, Simplified Chinese, Traditional Chinese, Classical Chinese and 
Persian, that are still actively maintained in SVN, next to regular 
contributors from the centralised system.

==The past, the present and the future==
MediaWiki localisation has always been a volunteer effort, and expect that it 
will remain so. 2009 brought a successful Google Summer of Code project, 
executed by Niklas Laxstrom [8,9] and the Wikimedia Foundation is supporting 
the localisation that takes place at translatewiki.net[10]. Not only 
MediaWiki, but all Open Source projects that are supported there[11] benefit 
from these developments. We want to keep using the Translate extension 
technology and expand on it, as well as nourish our translator base of nearly 
2,000 translators by providing them with better tooling and more projects in 
2010. Vereniging Wikimedia Nederland[12], the Dutch Wikimedia Chapter has 
granted 2,000 Euro to Stichting Open Progress[13] for the translatewiki.net 
Translation Rallies, that motivated its translators to make more than 60,000 
new translations for MediaWiki and its extensions in August and December 2009.

New opportunities lie in better support of Translation Memory technology and 
more supported projects to grow the community and allow the translators to 
spend their time as productive as possible, while still allowing all the 
socialising and collaboration features of MediaWiki. At the Google Summer of 
Code Mentor Summit there was interest from the KDE Documentation Project[14], 
the PHP Documentation Project, Pidgin, wxWidgets, and other projects. For 
translatewiki staff this was a confirmation that our approach works. The 
Translate extension however needs more development. If you want to work on an 
exciting extension that makes a difference in multi language support for Open 
Source software and MediaWiki content pages that require structured 
translation, check out the Translate extension and help us make it better. 
Your help *is* needed and most welcome!

The Wikimedia Strategic Planning process that is currently taking place also 
allows for a broader perspective on the localisation of MediaWiki in a 
Wikimedia context[15]. Support for several dozen MediaWiki extension in the 
Wikia code repository is expected within the next few weeks. Wikimedia is, or 
will soon be including a localisation score for language projects in their 
statistics, so that in a year we expect to be able to analyse if localisation 
is a requirement for a rise in usage or if it is a consequence[16].

==MediaWiki localisation statistics==
Daily statistics for MediaWiki and extension localisation have been available 
for the past two years[17]. For the past two years (arbitrary) milestones have 
been set for four collections of MediaWiki related messages. For the usability 
of MediaWiki in a particular language, the group 'core most used' is the most 
important. A language must qualify for MediaWiki to have 'minimal support' for 
that language in the first group. Reaching further milestones indicates the 
maturity of a localisation:
* core most used (469): 98%
* core (2,369 messages): 90%
* Wikimedia extensions (2,700 messages): 90%
* extensions (8,200 messages): 65%

Currently the following numbers of languages have passed the above 
milestones[18]:
* core most used: 147 (45.6% of supported languages - up 35% from 109 - goal 
of 130 passed)
* core: 82 (21.1% of supported languages - up 21% from 68 - goal of 90 missed 
by 203 translations)
* Wikimedia extensions: 44 (13.6% of supported languages - up 22% from 36 - 
goals of 50 missed by 1,500 translations)
* extensions: 39 (12.1% of supported languages - up 86% from 21 - goal of 30 
passed)

I think the changes in the past year are very satisfying. MediaWiki 
localisation has again improved enormously in the past year. Two of the four 
goals I set in last years' e-mail have not been reached (only one of four 
goals was reached for 2008). We nearly got there, though. Currently MediaWiki 
core contains 377,394 messages (up 24% from 303,863 ultimo 2008).

==Conclusion==
So... Is MediaWiki doing well on localisation? Just like the past two years, 
my personal opinion is that we do a proper job, but can still do a lot better. 
After all, MediaWiki is the engine that runs a top 5 site in the world 
committed to creating "a world in which every single human being can freely 
share in the sum of all knowledge." Observing that there are also an estimated 
hundred thousand MediaWiki installations out there, more than 250 Wikipedias 
that all use the Wikimedia Commons media repository, and that 147 languages 
out of 323 have a minimal localisation, there is a lot of room for 
improvement; more realistically: the work will never be done, we the least we 
can do is try to get there :).

Last year I mentioned languages from Africa performing way below average. I am 
sad to conclude that this has not changed considerably. In an overview with a 
weighted score for the localisation level of MediaWiki in a Wikimedia 
context[19], the largest African languages have the lowest score (52 out of 
100). Large languages spoken on multiple continents and large languages from 
Europe are doing best (100 and 99 out of 100 respectively). Languages like 
Oriya, Zulu, Burmese and Urdu are the large languages with the worst 
localisation score. It is my personal aim to work towards an average L10n 
score of 83 for the 50 largest languages in the world by the end of September 
2010.

We have all the tools to successfully localise MediaWiki into any of the 7,000 
or so languages that have been classified in ISO 639-3. We only need one 
person per language to make and effort and make it happen. Reaching the first 
milestone (core most used) takes about six hours of work. Using 
translatewiki.net or the Gettext file, little to no technical knowledge is 
required. Knowledge of MediaWiki is a plus.

This was the pitch, basically the same as in 2007 and 2008, with even more 
experience and data. Goals for MediaWiki localisation per end of 2010 are 
ambitious, but still realistic with the right effort:
* core most used: 170 languages with 98% or more localised
* core: 105 languages with 90% or more localised
* wikimedia extensions: 65 languages with 90% or more localised
* extensions: 50 languages with 65% or more localised

I would like to wish everyone involved in any aspect of MediaWiki a wonderful 
2010.

Cheers!

Siebrand Mazeland

[1] 
http://lists.wikimedia.org/pipermail/translators-l/2007-December/000571.html
[2] 
http://translatewiki.net/wiki/User:Siebrand/An_update_on_localisation_in_MediaWiki_%282008%29
[3] 
https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&component=Internationalization&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED
[4] http://translatewiki.net/wiki/User:Siebrand#Bugs
[5] als, be-x-old, ckb, crh, de-at, de-ch, de-formal, dk, en-gb, fiu-vro, gan, 
got, hif, kk, kk-cn, iu, kk-kz, kk-tr, ko-kp, ku, ku-arab, nb, ruq, simple, 
sr, tg, tp, tt, ug, zh, zh-classical, zh-cn, zh-sg, zh-hk, zh-min-nan, zh-mo, 
zh-my, zh-tw, zh-yue
[6] als, be-x-old, bh, ckb, ckb-latn, crh, de-at, dk, en-rtl, fiu-vro, gan, 
hif, hif-deva, ii, iu, kk, kk-cn, kk-kz, kk-tr, ko-kp, ks, ku, nb, pi, ruq, 
simple, st, tg, tp, tt, ug, zh-classical, zh-cn, zh-min-nan, zh-mo, zh-my, 
zh-sg, zh-yue
[7] http://www.mediawiki.org/wiki/Extension:LocalisationUpdate
[8] 
http://socghop.appspot.com/gsoc/student_project/show/google/gsoc2009/wikimedia/t124025074637
[9] http://laxstrom.name/blag/2009/09/01/gsoc-wrap-up-translate-extension/
[10] http://techblog.wikimedia.org/2009/10/supporting-translatewiki-net/
[11] http://translatewiki.net/wiki/Project_list
[12] http://nl.wikimedia.org
[13] http://www.openprogress.org/Stichting_Open_Progress
[14] http://translatewiki.net/wiki/Project:KDE_Documentation
[15] http://strategy.wikimedia.org/wiki/Localisation
[16] http://stats.wikimedia.org/EN/TablesCurrentStatusVerbose.htm
[17] http://translatewiki.net/wiki/Translating:Group_statistics
[18] http://translatewiki.net/wiki/Translating:Group_statistics_in_time
[19] 
http://translatewiki.net/wiki/Project:MediaWiki_localisation_in_the_50_most_spoken_languages





More information about the MediaWiki-l mailing list