Hi Amir,
Apologies for the late reply. Replies inline below.
Mike
On Fri, Aug 6, 2010 at 3:14 PM, Amir E. Aharoni <
amir.aharoni(a)mail.huji.ac.il> wrote:
Dear Michael, I also thank you for joining the
discussion. See my
question below.
2010/8/6 Michael Galvez <michaelcg(a)gmail.com>om>:
Also, as
far as Indic languages go, I would ask if there's any chance
you have any Oriya speakers - with 637 articles, the Oriya Wikipedia
is by far the most anemic of Indic-language Wikipedias, in spite of a
speaker population of 31 million.
Oriya is one of the languages we'd love to work on. We don't have any
activity on this today but if you have some Wikipedians who'd like to
help
us get this off the ground, we'd love to get
their contact info and we
can
follow up from there.
How do you decide, in general, with which languages to work? If i
understand correctly, until now you worked with Arabic, Swahili and
several Indian languages. But there are also languages in other parts
of the world, Wikipedias in which could profit from such a project.
For example, the Greek Wikipedia is surprisingly small
with only
54,500 articles (13 million speakers); Armenian has only 10,000
articles (6.7 million speakers); Georgian has 42,000 articles (4
million speakers). AFAIK, these language communities are largely
monolingual, that is, speakers of these languages may know English or
Russian, but they usually prefer to speak and write their own, unlike,
for example, speakers of Native American languages, many of whom use
English, Spanish or Portuguese online.
To decide which languages to target, we looked at several sets of metrics:
- we looked at the size of each Wikipedia based on words, articles, non-stub
articles (measured by articles over 2Kb), non-stub words (extrapolated),
from here:
http://stats.wikimedia.org/EN/
- we looked at the number of Internet users in each of those languages from
here:
http://www.internetworldstats.com/stats.htm
We also considered doing more refined measurements by accounting for Google
activity and mobile, but we ultimately went for the simple metrics above.
We took these numbers and calculated the number of words/articles/non-stub
articles/non-stub words per Internet user and normalized it with the
English Wikipedia = 1. We then focused on the the largest languages that
had deficits vis-a-vis English.
(A few folks in the audience of our talk at Wikimania asked us to leave a
soft copy of the slides that we presented that show this. I haven't
forgotten about this --- I am still working with PR to make that deck
publicly available.)
What has to happen so that a collaboration with Google
Translation
will begin in these languages? Do their representatives have to
approach Google or is it usually Google's decision?
We can do either (Google-initiated or community-initiated).
If you'd like for us to work with a particular language, feel free to reach
out to us directly. Please email translator-toolkit-support at
google.com.
--
אָמִיר אֱלִישָׁע אַהֲרוֹנִי
Amir Elisha Aharoni
http://aharoni.wordpress.com
"We're living in pieces,
I want to live in peace." - T. Moore
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/foundation-l