Richard, the spreadsheet could be found here [1]. I compiled the list again and it assumes some more languages (Swahili, for example), which didn't have numbers in Ethnologue's data.
On Thu, May 28, 2015 at 7:04 AM, Asaf Bartov abartov@wikimedia.org wrote:
For some of these languages, I don't see that this makes sense, in terms of investment versus impact, or in terms of putting the cart before the horse. Tsonga, for example, seems to have precisely one active editor[1] -- no doubt, our colleague Dumisani Ndubane (CCed as a courtesy). While it is indeed his native language, it does not seem like a good investment of effort to translate hundreds of pages and thousands of strings into Tsonga when he (and, with overwhelming likelihood, any other literate speaker of Tsonga) is also fluent (and educated in) English. Xhosa, Hausa and Zulu are in the same class[2][3][4].
It's not just about English. I've realized that there is the third group of the languages, which include the fact that speakers of various languages are in fact fluent in the other languages of the same area. Thus, those numbers could mislead. But, this is the initial data, for further analysis.
... Again, there already exists a community of dedicated contributors to the Urdu Wikipedia[5] (apparently more from India than from Pakistan, no doubt partially due to script issues[6]). Some of you, particularly in the last year, have been energetically mentoring newcomers and doing outreach activities. Our colleagues Nisar Ahmad Syed and Muzammiluddin Syed (CCed) are two such volunteers. Now, what, precisely, are you suggesting?
Members of our community native in a language widely as L2 (or similar) have wider responsibility for keeping translations up to date. According to the present situation, their responsibility to [usually] minority languages is even greater than the responsibility toward their own language, as it's matter of would that language have or not Wikimedia project(s).
For example, if we get a community interested and being capable to have a number of Wikimedia projects, but not that fluent in English, we need to cover in their L2 all important MediaWiki messages and keep them up to date.
There are two problems with which a group of enthusiasts in such linguistic situation: * They have to translate ~500 messages and it's not a time consuming problem. However, even Hindi and Arabic don't have 100% of translated messages from that group [2] (likely one message is the issue). If we count on, let's say, dozens of volunteers educated in Hindi or Arabic but far from being fluent in English, that could be significant problem. If they want to create the next project, let's say Wiktionary to gather lexical data -- and I think the second two message groups have to be translated for that (not quite sure, will have to take a look into the LangCom list archives) --, they have even bigger problem. If they want to create the third one, let's say Wikibooks, to create educational textbooks, they are likely in front of the problem which they won't be able to solve. * MediaWiki is developing and messages are changing. While it doesn't matter a lot for the main language to have 99% and not 100% of translated most used messages, the new one won't get a project if it's not 100%. (The situation as it is; I don't like it, but I can't change it.)
It turns out that the threshold for very small languages is much higher than for the big ones. And we have to find a way to help them. L2 languages are one of the good starting points.
Further... We don't have statistics related to the translation of the articles which define Wikipedia, Wikimedia and other projects. And that has to be translated by Wikimedians who are well into the matter and understand fully the meaning of the content. (I don't even want to open the question of the fact that we don't have those defining documents on Meta, but on English language projects; more than a decade of Wikimedia Foundation existence.) So, even we use customary rules "translate it from English Wikipedia", we are pretty incapable of following the translation paths. And that's quite important to us. I mean, important so much that it defines who we are and what we expect from newcomers.
And I didn't say anything about current events, of which the most important ones should be transferred to as many as possible Wikimedians, including those who don't speak English. (It isn't realistic to expect transfer of information to a Wikimedian who speaks just his or her native language, while Wikipedia in that language has few dozens of active editors or less.)
Now, what I suggest...
First of all, I'd like to see the most important content translated into the languages used for wider communication. Which methods would be used, it's different question and while I will suggest some of them, I really don't mind if a different approach or approaches would be used.
If we want to do that, we should define "the starter kit": What's the most important to be translated? How often do those document change? How many translators do we need to keep the messages and documents up to date? How can we increase the number of translators? What could we use to improve response for calls for translation? What's the minimum capacity one community should have to maintain translated various segments?
Thinking generally, it's probably good idea for the beginning that WMF promote among chapters and user groups translation editathons according to the recommendations created based on the starter kit: "Don't you area have those languages spoken? Would you like to organize annual translation editathons?"
But, as I said, the last paragraph is particular methodology. If it could work better by using something else, I am for it.
[1] https://docs.google.com/spreadsheets/d/1dbL-aJAStMuGlnxq5Md0s0zCmX--35JzRKcX... [2] https://translatewiki.net/wiki/Translating:Group_statistics