Alec Conroy wrote:
The recent elections showed us that language issues and translation are something we have to take very seriously from now on. As a first step towards improving communication, it seems like we should get an idea of which users speak which languages?
We could directly ask them to tell us, but upon reflection, the information is already hidden in our database. A multilingual user is one that actively edits two projects of different languages.
Many users already told us, by using babel templates. That also explains how much confidence do they have in those languages (native level, basic skills...).
In devising a comprehensive translation strategy, we need to know how interconnected any two given projects are. We also need to know how connected any given project is to English, since it's our working language.
There's also the motivation factor. I am not much of a translator. Although I have fixed translations that I encountered just when accessing as a user that had been there for days. From what I have seen in the past many translations aren't done by the skilled people but just by people that was motivated enough to translate it, which sometimes are in a autotranslation-like level. However, as the people running the event obviously don't know every language, they have to rely on the few translating users, and bad texts pass as 'translated'.
We need to pay special attention to languages that are very 'distant' from English-- distant in the sense of having few members who fluent in both English and the language in question.
Could someone aid me in getting this data, or explaining why I don't need it or why we already have it, etc?
Specifically, I'm looking for: # For each non-english-language project, how many of their active users are ALSO active on an english-language project? (the answer is should be a single whole number for each project)
First point: define being active. That should be something like 'more than X non-minor edits in the last Y weeks.'
I see a problem in that you are exposing it as a symmetric relationship, while I don't think it should be. I could be very skilled to translate something to my mother tongue, but an inept to translate it in the opposite way. Specially when translating between similar languages, where a non-speaker can easily grasp the meaning.
Also, someone which routinely translates articles for enwiki to xzwiki would have the exact profile you want to discover, but could be skipped due to not having enough edits to enwiki.
# For any two projects, how many users are there who are active on both? (answer is a square matrix, roughly 750x750 ) # For any two languages, how many users appear to speak both languages? (answer is a square matrix, roughly 750x750)
I think the answer would actually be three-dimensional, since for each cell you would have a list of people, the number being just a summary.
Does anyone know how to pull this out of the database? It's an important question for us to recruit translators and really just assess "where we are" in terms of inter-project language capabilities.
Alec
I think I can build you something if you give me appropiate values for the above definition.
Cheers