I have not seen a comprehensive overview of MediaWiki localisation discussed on the lists I am posting this message to, so I thought I might give it a try. All statistics are based on MediaWiki 1.12 alpha, SVN version r29106.
==Introduction==
*Localisation or L10n - the process of adapting the software to be as familiar as possible to a specific locale (in scope)
*Internationalisation or i18n - the process of ensuring that an application is capable of adapting to local requirements (out of scope)
MediaWiki has a user interface (UI) definition for 319 languages. Of those languages at least 17 language codes are duplicates and/or serve a purpose for usability[1]. Reporting on them, however, is not relevant. So MediaWiki in its current state supports 302 languages. To be able to generate statistics on localisation, a MessagesXx.php file should be present in languages/messages. There currently are 262 such files, of which 16 are redirects from the duplicates/usability group[2]. So MediaWiki has an active in-product localisation for 236 languages. 66 languages have an interface, but simply fall back to English.
The MediaWiki core product recognises several collections of localisable content (three of which are defined in messageTypes.inc):
* 'normal' messages that can be localised (1726)
* optional messages that can be localised, which usually only happens for languages not using a Latin script (161)
* ignored messages that should not be localised (100)
* namespace names and namespace aliases (17)
* skin names (7)
* magic words (120)
* special page names (76)
* other (directionality, date formats, separators, book store lists, link trail, and others)
Localisation of MediaWiki revolves around all of the above. Reporting is done on the normal messages only.
MediaWiki is more than just the core product. On http://www.mediawiki.org/wiki/Category:All_extensions some 750 extensions have some kind of documentation. This analysis will scope only to the code currently present in svn.wikimedia.org/svnroot/mediawiki/trunk. The source code repository contains give or take 230 extensions. Of those 230 extensions, about 140 contain messages that can be visible in the UI in some use case (debugging excluded). Out of those 140, about 10 extensions have an exotic implementation for localisation localisation support at all (just English text in the code). 10 extensions appear to be outdated. I have seen about 5 different 'standard' implementations of i18n in extensions. Since MediaWiki 1.11 there is wfLoadExtensionMessages. Not that many extensions use this yet for message handling. If you can help add more standard i18n support for extensions (an overview can be found at http://translatewiki.net/wiki/User:Siebrand/tobeadded) or help in standardising L10n for extensions, please do not hesitate.
==MediaWiki localisation in practice==
Localisation of MediaWiki is currently done in the following ways I am aware of:
* in local wikis: Sysops on local wikis shape and translate messages to fit their needs. This is being done in wikis that are part of Wikimedia, Wikia, Wikitravel, corporate wikis, etc. This type of localisation has the fewest benefits for the core product and extensions because it happens completely out of the scope of svn committers. I have heard Wikia supports languages that are not supported in the svn version. I would like to get some help in identifying and contacting these communities to try and get their localisations in the core product. Together with SPQRobin, I am trying to get what has been localised in local Wikipedias into the core product and recruit users that worked on the localisation to work on a more centralised way of localisation (see Betawiki)
* through bugzilla/svn: A user of MediaWiki submits patches for core messages and/or extensions. These users are mostly part of a wiki community that is part of Wikimedia. These are usually taken care of by committers raymond, rotemliss, and sometimes others). Some users maintain a language directly on SVN. At the moment, 10-15 languages are maintained this way: Danish, German, Persian, Hebrew, Indonesian, Kazach (3 scripts), Chinese (3 variants), and some more less frequently.
* through Betawiki: Betawiki was founded in mid 2005 by Niklas Laxström. In the years to follow, Betawiki has grown to be a MediaWiki localisation community with over 200 users that has contributed to the localisation of 120 languages each month in the past few months. Users that are only familiar with MediaWiki as a tool can localise almost every aspect of MediaWiki (except for the group 'other' mentioned earlier) in a wiki interface. The work of the translators is regularly committed to svn by nikerabbit, and myself. Betawiki also offers a .po export that enables users to use more advanced translation tools to make their translation. This option was added recently and no translations in this format have been sumitted yet. Betawiki also supports translation of 122 extensions, aiming to support everything that can be supported.
==MediaWiki localisation statistics==
MediaWiki localisation statistics have been around since June 2005 at http://www.mediawiki.org/wiki/Localisation_statistics[3]. Traditionally reports have focused on the complete set of core messages. Recently a small study was done after usage of messages, which resulted in a set of almost 500 'most often used messages in MediaWiki', based on usage of messages on the cluster of Wikimedia (http://translatewiki.net/wiki/Most_often_used_messages_in_MediaWiki).
Up to recently there were no statistics available on the localisation of extensions. Through groupStatistics.php in the extension Translate, these statistics can now be created. Aside from reporting on 'most often used MediaWiki messages', 'MediaWiki messages', and 'all extension messages supported by extension Translate' (short: extension messages). Additionally, a meta extension group of 34 extensions used in the projects of Wikimedia has been created (short: Wikimedia messages). A regularly updated table of these statistics can be found at http://translatewiki.net/wiki/Translating:Group_statistics.
Some (arbitrary) milestones have been set for the four above mentioned collections of messages. For the usability of MediaWiki in a particular language, the group 'core-mostused' is the most important. A language must qualify for MediaWiki to have minimal support for that language. Reaching the milestones for the first two groups is something the Wikimedia language committee considers to use as a requirement for new Wikimedia wikis:
* core-mostused (496 messages): 98%
* wikimedia extensions (354 messages): 90%
* core (1726 messages): 90%
* extensions (1785 messages): 65%
Currently the following numbers of languages have passed the above milestones:
* core-mostused: 47 (15,5% of supported languages)
* wikimedia extensions: 10 (3,3% of supported languages)
* core: 49 (16,2% of supported languages)
* extensions: 7 (2,3% of supported languages)
==Conclusion==
So... Are we doing well on localisation or do we suck? My personal opinion is that we do something in between. Observing that there are some 250 Wikipedias that all use the Wikimedia Commons media repository, and that only 47 languages have a minimal localisation, we could do better. With Single User Login around the corner (isn't it), we must do better. On the other hand, new language projects within Wikimedia all have excellent localisation of the core product. These languages include Asturian, Bikol Central, Lower Sorbian, Extremaduran, and Galician. But where is Hindi, for example, with currently only 7% of core messages translated?
With the Wikimedia Foundation aiming to put MediaWiki to good use in developing countries and products like NGO-in-a-box that include MediaWiki, the potential of MediaWiki as a tool in creating and preserving knowledge in the languages of the world is huge. We have to tap into that potential and *you* (yes, I am glad you read this far and are now reading my appeal) can help. If you know people that are proficient in a language and like contributing to localisation, please point them in the right direction. If you know of organisations that can help localising MediaWiki: please approach them and ask them to help.
We have all the tools now to successfully localise MediaWiki into any of the 7000 or so languages that have been classified in ISO 639-3. We only need one person per language to make it happen. Reaching the first two milestones (core-mostused and wikimedia extensions) takes about 16 hours of work. Using Betawiki or the .po, little to no technical knowledge is required.
This was the pitch. How about we aim to at least double the numbers by the end of 2008 to:
* core-mostused: 120
* wikimedia extensions: 50
* core: 90
* extensions: 20
I would like to wish everyone involved in any aspect of MediaWiki a wonderful 2008.
Cheers!
Siebrand Mazeland
[1] als,crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[2] crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[3] older locations are http://www.mediawiki.org/wiki/Localisation_statistics/stats and
http://meta.wikimedia.org/wiki/Localization_statistics
Oversight is being overused where deletion would be appropriate when
deletion isn't possible ...
Any progress on single-edit deletion?
- d.
---------- Forwarded message ----------
From: Dmcdevit <dmcdevit(a)cox.net>
Date: 19 Feb 2008 14:10
Subject: Re: [Foundation-l] misleading advice on oversight on wikizine
To: Wikimedia Foundation Mailing List <foundation-l(a)lists.wikimedia.org>
Thomas Dalton wrote:
> I disagree. There is an increased need for oversight (it's marginal,
> though). Before, when one of the 3 situations you quote came up we had
> a choice between oversight and simple deletion. Simple deletion was
> find in the less sensitive cases. Now, for large pages, that option is
> gone, so oversight will need to be used in every case, that results in
> a greater need for oversight.
>
There is no room for disagreement here. Oversight is an extreme tool
governed by a Board-supported policy on its use, which involved
immediate loss of privileges for misuse. Oversight is not simply
deletion of a single revision, it is the removal of that revision
entirely from the history so that not even any administrator can see it
or even see a log that it was done, and it can only be reversed with the
help of a developer. We can certainly say that we hope that
single-revision deletion becomes easier with new features added to
MediaWiki in the future, but it is inappropriate to suggest that
oversight can fulfill this role.
Dominic
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Hello Everyone,
I am trying to add some content to wikitext on 'Save'. The formatting blows
up when i refresh the page.
Example: the h2 header now shows as the following after doing
$element[$i]=$element[$i]." a "
== Call Info == a
*I am using the hook ArticleSave. It seems if i add spaces, newlines it
works. but when i add any text it blows up.*
Pls find the code snippet.
*/**
*adding on the hook $wgHooks['ArticleSave'][] = 'functionA';*
*function functionA(&$article, &$user, &$text, &$summary, $minor, $watch,
$sectionanchor, &$flags)*
*$element=explode("\n",$text)
for loop
$element[$i]=$element[$i]." " // this works
$element[$i]=$element[$i]." \n\n " // this works
$element[$i]=$element[$i]." a " // this does not *
**/*
*Possible problem area*: it seems the problem is with the Parser.php in the
doHeadings function where a regular expression is been checked. Once the
text
is added seems the parser does not recognise it as header anymore.
It will be nice if someone can provide help on this.
Thanks
--Viral
Sorry, about that - the 20080102-pages-articles I have /does/ include Templates, so it probably is the rebuild step, as VasilieVV says.
Connel
> From: connelm(a)msn.com
> To: wikitech-l(a)lists.wikimedia.org
> Subject: RE: Data dump of wiktionary doesn't seem to have any template
> Date: Mon, 18 Feb 2008 17:24:21 +0000
>
>
> Hi Gary, I think you want http://download.wikimedia.org/enwiktionary/20080213/enwiktionary-20080213-p… instead - that one includes the Template: namespace.
>
> Connel
>
>
>> Message: 9
>> Date: Sun, 17 Feb 2008 21:01:29 +0530
>> From: "Apple Grew"
>> Subject: Re: [Wikitech-l] Data dump of wiktionary doesn't seem to have
>> any template pages.
>> To: "Wikimedia developers"
>> Message-ID:
>>
>> Content-Type: text/plain; charset=UTF-8
>>
>> Thanks for the reply.
>> Let's hope. I have started the rebuild links process. It is already
>> over 6hrs and it is still in progress.
>>
>> I will report here if that fixes the problem. BTW anymore insights?
>>
>> On Feb 17, 2008 8:45 PM, VasilievVV wrote:
>>>
>>> Apple Grew writes:
>>>> Hello everybody,
>>>>
>>>> I am new to this. I am currently trying to mirror en.Wiktionary.org on
>>>> my computer so that I can browse it smoothly even on my slow college
>>>> net and partly just for fun.
>>>>
>>>> I have installed MediaWiki (SVN 12apha). I have imported the
>>>> pages.article.xml.bz2 data dump
>>>> (http://download.wikimedia.org/enwiktionary/20080213/enwiktionary-20080213-p…).
>>>> All is fine but it seems that it hardly contains any pages of Template
>>>> namespace, hence I am getting the weird - Template:Shortcut, etc.
>>>> thrown up every where. I have tried manually creating some pages and
>>>> coping the code from the original site, but there are too many
>>>> templates to be copied. Is there any more data dumps that actually has
>>>> this data, or there is some alternative saner method? Pls help I am
>>>> stuck. :(
>>>>
>>>> Regards,
>>>> Apple Grew
>>>> my blog @ http://applegrew.blogspot.com/
>>>
>>> I don't know much about database dumps, but it may be because you didn;t
>>> rebuild links?
>>> --VasilievVV
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> Wikitech-l(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>
>>
>>
>> --
>> Apple Grew
>> my blog @ http://applegrew.blogspot.com/
On Feb 18, 2008 2:25 AM, <tstarling(a)svn.wikimedia.org> wrote:
> * Removed nonsense warning about the output of wfMsg() not being safe for inclusion in HTML.
I assume what Erik meant there is that it may output arbitrary HTML,
and we're trying to move away from allowing sysops to insert arbitrary
HTML into pages.
Hi There,
My name is Fred Benenson and I'm a graduate student (as well as free
culture activist) doing research on Wikipedia. I'm working on a problem that
requires an up to date version of the logging on wikipedia.
I was told to look here (after posting, unsuccessfully to JIRA) for help
on a problem I've found with a dump:
Despite its name, enwiki-latest-logging (available at
http://download.wikimedia.org/enwiki/latest/) is not actually the latest
logging information.
I've found the most recent log_timestamp of a row is somewhere around
February 2007. This means that the dump is about a year old, and is not the
'latest' version. It'd be great to get a fairly recent (within a month)
version up live.
Let me know if there's anyway I can help or make this easier.
Thanks.
Fred
On Thu, Feb 14, 2008 at 4:17 PM, <tbleher(a)svn.wikimedia.org> wrote:
> wfMsgExt() does not recognize the parameter 'parseline'.
Surely this was meant to be 'parseinline'?
Dear MediaWiki Foundation,
I am writing to request an SVN account to host source code for an
extension that we are currently developing for MediaWiki. This
extension is being developed by a collaboration of two
Northeastern University students and CIM Engineering, Inc (dba
"CIM3").
The Software Idea:
With 2,073,813 articles in English and growing, Wikipedia is the
world's largest collaboratively edited source of encyclopedic
knowledge. As more and more people adding content and using it as
reference for their research, it becomes important to know what
data resides where on the wiki. Users generally do this by book
marking page for future reference. Book marking option in web
browser let you bookmark the URL, which is generally the whole
page. With the amount of content on any given page, it might
still take a while to find the content user is looking for. The aim of
the projects is to produce an extension that will add Purple
Numbers and collaborative tagging capability to MediaWiki. Purple
Numbers will provide the following added functionality in
MediaWiki.
1. High resolution addressability in a wiki page:
With Purple Numbers, MediaWiki user can have high resolution
addressibility to a wiki page. The purpose of Purple Numbers is
simple: to produce HTML documents that can be addressed with high
resolution (also called "fine granularity"). It does this by
automatically creating name anchors with static (nidÿÿs) and
hierarchical (hidÿÿs) addresses at the beginning of each node,
and by displaying these addresses as links at the end of each
node.
2. Transclusion:
Transclusion is the inclusion of part of a document into another
document by reference. Transclusion is best explained by an
example. Consider the following scenario a user wants to display
some data (picture, chart, etc.) about X on page that mentions X
in some other content. With Transclusion, user can reference data
about X from the X page without copy/paste the data on their own
page. Since the data is referenced and not copied, any changees
made to the data will reflect on the userÿÿs page also.
3. Collaborative tagging at node (Purple Numbers) level:
Collaborative tagging (also know as folksonomy, social
classification, social indexing and other names) is the practice
and method of collaboratively creating and managing tags to
annotate and categorize content. In contrast to traditional
subject indexing, metadata is not only generated by experts but
also by creators and consumers of the content. Freely chosen
keywords are used instead of controlled vocabulary.
Some Important Links:
Project Wiki:
http://project.cim3.net/wikihttp://project.cim3.net/wiki/PMWX
Project Status:
http://project.cim3.net/wiki/PMWX
History of Purple Numbers:
http://community.cim3.net/cgi-bin/wiki.pl?PurpleNumbershttp://www.bootstrap.org/#9Bhttp://www.eekim.com/software/purple/purple.htmlhttp://collab.blueoxen.net/cgi-bin/wiki.pl?PurpleNumbers
Transclusion:
http://en.wikipedia.org/wiki/Transclusion
Collaborative tagging:
http://en.wikipedia.org/wiki/Collaborative_tagging
The Team:
http://project.cim3.net/wiki/PMWX#The_Team
Thank You
Viral Gupta
Hi Gary, I think you want http://download.wikimedia.org/enwiktionary/20080213/enwiktionary-20080213-p… instead - that one includes the Template: namespace.
Connel
> Message: 9
> Date: Sun, 17 Feb 2008 21:01:29 +0530
> From: "Apple Grew"
> Subject: Re: [Wikitech-l] Data dump of wiktionary doesn't seem to have
> any template pages.
> To: "Wikimedia developers"
> Message-ID:
>
> Content-Type: text/plain; charset=UTF-8
>
> Thanks for the reply.
> Let's hope. I have started the rebuild links process. It is already
> over 6hrs and it is still in progress.
>
> I will report here if that fixes the problem. BTW anymore insights?
>
> On Feb 17, 2008 8:45 PM, VasilievVV wrote:
>>
>> Apple Grew writes:
>>> Hello everybody,
>>>
>>> I am new to this. I am currently trying to mirror en.Wiktionary.org on
>>> my computer so that I can browse it smoothly even on my slow college
>>> net and partly just for fun.
>>>
>>> I have installed MediaWiki (SVN 12apha). I have imported the
>>> pages.article.xml.bz2 data dump
>>> (http://download.wikimedia.org/enwiktionary/20080213/enwiktionary-20080213-p…).
>>> All is fine but it seems that it hardly contains any pages of Template
>>> namespace, hence I am getting the weird - Template:Shortcut, etc.
>>> thrown up every where. I have tried manually creating some pages and
>>> coping the code from the original site, but there are too many
>>> templates to be copied. Is there any more data dumps that actually has
>>> this data, or there is some alternative saner method? Pls help I am
>>> stuck. :(
>>>
>>> Regards,
>>> Apple Grew
>>> my blog @ http://applegrew.blogspot.com/
>>
>> I don't know much about database dumps, but it may be because you didn;t
>> rebuild links?
>> --VasilievVV
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
>
> --
> Apple Grew
> my blog @ http://applegrew.blogspot.com/
Some people noticed that since a few days, they seem to get logged out
of de.wikipedia after leaving the site or restarting their browser,
even though they checked the stay-logged-in box. Another symptom is
that mostly the main page seemed to suffer from that, and they'd get
logged in magically by viewing a new page. The problem is
browser-independent. Only de.wikipedia seems to be affected.
If you can read German:
http://de.wikipedia.org/wiki/Wikipedia:Fragen_zur_Wikipedia#Komisches_Auslo…
Sounded like a caching issue to me. I could reproduce the problem,
restart Firefox and be logged out. Force-reload "healed" his, and I
got the main page with my user links again.
Did someone change squid/caching settings around the 13/14th? Maybe on
the European squids?
Magnus