Re: [Foundation-l] Push translation

13 Aug 2010

Hi Amir,

Apologies for the late reply.   Replies inline below.

Mike

On Fri, Aug 6, 2010 at 3:14 PM, Amir E. Aharoni <
amir.aharoni(a)mail.huji.ac.il&gt; wrote:

...
  Dear Michael, I also thank you for joining the
discussion. See my
 question below.

 2010/8/6 Michael Galvez &lt;michaelcg(a)gmail.com&gt;om>:
   Also, as
far as Indic languages go, I would ask if there's any chance
 you have any Oriya speakers - with 637 articles, the Oriya Wikipedia
 is by far the most anemic of Indic-language Wikipedias, in spite of a
 speaker population of 31 million.

  Oriya is one of the languages we'd love to work on.  We don't have any
 activity on this today but if you have some Wikipedians who'd like to  help
  us get this off the ground, we'd love to get
their contact info and we  can
  follow up from there. 
 How do you decide, in general, with which languages to work? If i
 understand correctly, until now you worked with Arabic, Swahili and
 several Indian languages. But there are also languages in other parts
 of the world, Wikipedias in which could profit from such a project. 

...
  For example, the Greek Wikipedia is surprisingly small
with only
 54,500 articles (13 million speakers); Armenian has only 10,000
 articles (6.7 million speakers); Georgian has 42,000 articles (4
 million speakers). AFAIK, these language communities are largely
 monolingual, that is, speakers of these languages may know English or
 Russian, but they usually prefer to speak and write their own, unlike,
 for example, speakers of Native American languages, many of whom use
 English, Spanish or Portuguese online.

To decide which languages to target, we looked at several sets of metrics:
- we looked at the size of each Wikipedia based on words, articles, non-stub
articles (measured by articles over 2Kb), non-stub words (extrapolated),
from here: http://stats.wikimedia.org/EN/
- we looked at the number of Internet users in each of those languages from
here: http://www.internetworldstats.com/stats.htm
We also considered doing more refined measurements by accounting for Google
activity and mobile, but we ultimately went for the simple metrics above.

We took these numbers and calculated the number of words/articles/non-stub
articles/non-stub words per Internet user  and normalized it with the
English Wikipedia = 1.  We then focused on the the largest languages that
had deficits vis-a-vis English.

(A few folks in the audience of our talk at Wikimania asked us to leave a
soft copy of the slides that we presented that show this.  I haven't
forgotten about this --- I am still working with PR to make that deck
publicly available.)

...
  What has to happen so that a collaboration with Google
Translation
 will begin in these languages? Do their representatives have to
 approach Google or is it usually Google's decision?

We can do either (Google-initiated or community-initiated).

If you'd like for us to work with a particular language, feel free to reach
out to us directly.  Please email translator-toolkit-support at google.com.

...
  --
 אָמִיר אֱלִישָׁע אַהֲרוֹנִי
 Amir Elisha Aharoni

 http://aharoni.wordpress.com

 "We're living in pieces,
  I want to live in peace." - T. Moore

 _______________________________________________
 foundation-l mailing list
 foundation-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Foundation-l] Push translation