Alec Conroy wrote:
The recent elections showed us that language issues
and translation
are something we have to take very seriously from now on. As a first
step towards improving communication, it seems like we should get an
idea of which users speak which languages?
We could directly ask them to tell us, but upon reflection, the
information is already hidden in our database. A multilingual user is
one that actively edits two projects of different languages.
Many users already told us, by using babel templates. That also explains
how much confidence do they have in those languages (native level, basic
skills...).
In devising a comprehensive translation strategy, we
need to know how
interconnected any two given projects are. We also need to know how
connected any given project is to English, since it's our working
language.
There's also the motivation factor. I am not much of a translator.
Although I have fixed translations that I encountered just when
accessing as a user that had been there for days.
From what I have seen in the past many translations aren't done by the
skilled people but just by people that was motivated enough to translate
it, which sometimes are in a autotranslation-like level.
However, as the people running the event obviously don't know every
language, they have to rely on the few translating users, and bad texts
pass as 'translated'.
We need to pay special attention to languages that are
very 'distant'
from English-- distant in the sense of having few members who fluent
in both English and the language in question.
Could someone aid me in getting this data, or explaining why I don't
need it or why we already have it, etc?
Specifically, I'm looking for:
# For each non-english-language project, how many of their active
users are ALSO active on an english-language project? (the answer is
should be a single whole number for each project)
First point: define being active. That should be something like 'more
than X non-minor edits in the last Y weeks.'
I see a problem in that you are exposing it as a symmetric relationship,
while I don't think it should be. I could be very skilled to translate
something to my mother tongue, but an inept to translate it in the
opposite way.
Specially when translating between similar languages, where a
non-speaker can easily grasp the meaning.
Also, someone which routinely translates articles for enwiki to xzwiki
would have the exact profile you want to discover, but could be skipped
due to not having enough edits to enwiki.
# For any two projects, how many users are there who
are active on
both? (answer is a square matrix, roughly 750x750 )
# For any two languages, how many users appear to speak both
languages? (answer is a square matrix, roughly 750x750)
I think the answer would actually be three-dimensional, since for each
cell you would have a list of people, the number being just a summary.
Does anyone know how to pull this out of the database?
It's an
important question for us to recruit translators and really just
assess "where we are" in terms of inter-project language capabilities.
Alec
I think I can build you something if you give me appropiate values for
the above definition.
Cheers