On Wed, Jun 15, 2011 at 7:42 AM, Platonides Platonides@gmail.com wrote:
Alec Conroy wrote: > We could directly ask them to tell us, but upon reflection, the > information is already hidden in our database. A multilingual user is > one that actively edits two projects of different languages.
Many users already told us, by using babel templates. That also explains how much confidence do they have in those languages (native level, basic skills...).
Babel templates are great-- if every user had them, we'd be good. Unfortunately, if you know enough to use a babel template, you probably are already 'tied in' to the global community and thus not in need of outreach. (this assumption may be false).
There's also the motivation factor.
That's saying a mouthful. Just knowing people can translate is not at all the same as being able to expect they'll actually do it. We just found that out, and that's why we need to start building a translator network now, rather than wait till next year.
First point: define being active. That should be something like 'more than X non-minor edits in the last Y weeks.'
I'm flexible. The point of activity is just to weed the data down to a manageable size. If we want to call anyone active at this stage, that'd work. I suggest lasttouched in 30 days, but that's totally arbitrary.
I see a problem in that you are exposing it as a symmetric relationship, while I don't think it should be.
Again, another very brilliant caveat. I should say that my initial attempt at getting these kinds of estimates was to look at wordwide language-overlap statistics and just assume that wikimedians are "average humans", which they clearly aren't. This would get us a very very rough picture.
Analysis of actual edit patterns will get us a better view, but it'll still be less precise than babel boxes or actual self-identification as a translator. Perhaps at some point we can explicitly ask users to tell us directly their language skills.
Alecmconroy