[Foundation-l] Push translation

Michael Galvez michaelcg at gmail.com
Fri Aug 13 20:28:02 UTC 2010


Hi Amir,

Apologies for the late reply.   Replies inline below.

Mike

On Fri, Aug 6, 2010 at 3:14 PM, Amir E. Aharoni <
amir.aharoni at mail.huji.ac.il> wrote:

> Dear Michael, I also thank you for joining the discussion. See my
> question below.
>
> 2010/8/6 Michael Galvez <michaelcg at gmail.com>:
> >> Also, as far as Indic languages go, I would ask if there's any chance
> >> you have any Oriya speakers - with 637 articles, the Oriya Wikipedia
> >> is by far the most anemic of Indic-language Wikipedias, in spite of a
> >> speaker population of 31 million.
> >>
> >>
> > Oriya is one of the languages we'd love to work on.  We don't have any
> > activity on this today but if you have some Wikipedians who'd like to
> help
> > us get this off the ground, we'd love to get their contact info and we
> can
> > follow up from there.
>
> How do you decide, in general, with which languages to work? If i
> understand correctly, until now you worked with Arabic, Swahili and
> several Indian languages. But there are also languages in other parts
> of the world, Wikipedias in which could profit from such a project.


> For example, the Greek Wikipedia is surprisingly small with only
> 54,500 articles (13 million speakers); Armenian has only 10,000
> articles (6.7 million speakers); Georgian has 42,000 articles (4
> million speakers). AFAIK, these language communities are largely
> monolingual, that is, speakers of these languages may know English or
> Russian, but they usually prefer to speak and write their own, unlike,
> for example, speakers of Native American languages, many of whom use
> English, Spanish or Portuguese online.
>

To decide which languages to target, we looked at several sets of metrics:
- we looked at the size of each Wikipedia based on words, articles, non-stub
articles (measured by articles over 2Kb), non-stub words (extrapolated),
from here: http://stats.wikimedia.org/EN/
- we looked at the number of Internet users in each of those languages from
here: http://www.internetworldstats.com/stats.htm
We also considered doing more refined measurements by accounting for Google
activity and mobile, but we ultimately went for the simple metrics above.

We took these numbers and calculated the number of words/articles/non-stub
articles/non-stub words per Internet user  and normalized it with the
English Wikipedia = 1.  We then focused on the the largest languages that
had deficits vis-a-vis English.

(A few folks in the audience of our talk at Wikimania asked us to leave a
soft copy of the slides that we presented that show this.  I haven't
forgotten about this --- I am still working with PR to make that deck
publicly available.)


> What has to happen so that a collaboration with Google Translation
> will begin in these languages? Do their representatives have to
> approach Google or is it usually Google's decision?
>

We can do either (Google-initiated or community-initiated).

If you'd like for us to work with a particular language, feel free to reach
out to us directly.  Please email translator-toolkit-support at google.com.



> --
> אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> Amir Elisha Aharoni
>
> http://aharoni.wordpress.com
>
> "We're living in pieces,
>  I want to live in peace." - T. Moore
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list