[Wikimedia-l] The case for supporting open source machine translation

George Herbert george.herbert at gmail.com
Wed Apr 24 07:02:27 UTC 2013


I agree.  This is a timely observation about a major problem which directly affects the Foundation's core goals.

I am unsure how far an effort can go today given the state of the art and science, but I think that this is entirely appropriate to think about and investigate and perhaps either fund or bring attention to, perhaps both.


George William Herbert

On Apr 23, 2013, at 11:39 PM, Ting Chen <wing.philopp at gmx.de> wrote:

> Oh yes, this would really be great. Just think about the money the Foundation gives out meanwhile for translation, plus the many many volunteers' work invested into translation. A free and open translation software is long overdue indeed. Great idea Erik.
> 
> Greetings
> Ting
> 
> Am 4/24/2013 8:29 AM, schrieb Erik Moeller:
>> Wikimedia's mission is to make the sum of all knowledge available to
>> every person on the planet. We do this by enabling communities in all
>> languages to organize and collect knowledge in our projects, removing
>> any barriers that we're able to remove.
>> 
>> In spite of this, there are and will always be large disparities in
>> the amount of locally created and curated knowledge available per
>> language, as is evident by simple statistical comparison (and most
>> beautifully visualized in Erik Zachte's bubble chart [1]).
>> 
>> Google, Microsoft and others have made great strides in developing
>> free-as-in-beer translation tools that can be used to translate from
>> and to many different languages. Increasingly, it is possible to at
>> least make basic sense of content in many different languages using
>> these tools. Machine translation can also serve as a starting point
>> for human translations.
>> 
>> Although free-as-in-beer for basic usage, integration can be
>> expensive. Google Translate charges $20 per 1M characters of text for
>> API usage. [2] These tools get better from users using them, but I've
>> seen little evidence of sharing of open datasets that would help the
>> field get better over time.
>> 
>> Undoubtedly, building the technology and the infrastructure for these
>> translation services is a very expensive undertaking, and it's
>> understandable that there are multiple commercial reasons that drive
>> the major players' ambitions in this space. But if we look at it from
>> the perspective of "How will billions of people learn in the coming
>> decades", it seems clear that better translation tools should at least
>> play some part in reducing knowledge disparities in different
>> languages, and that ideally, such tools should be "free-as-in-speech"
>> (since they're fundamentally related to speech itself).
>> 
>> If we imagine a world where top notch open source MT is available,
>> that would be a world where increasingly, language barriers to
>> accessing human knowledge could be reduced. True, translation is no
>> substitute for original content creation in a language -- but it could
>> at least powerfully support and enable such content creation, and
>> thereby help hundreds of millions of people. Beyond Wikimedia, high
>> quality open source MT would likely be integrated in many contexts
>> where it would do good for humanity and allow people to cross into
>> cultural and linguistic spaces they would otherwise not have access
>> to.
>> 
>> While Wikimedia is still only a medium-sized organization, it is not
>> poor. With more than 1M donors supporting our mission and a cash
>> position of $40M, we do now have a greater ability to make strategic
>> investments that further our mission, as communicated to our donors.
>> That's a serious level of trust and not to be taken lightly, either by
>> irresponsibly spending, or by ignoring our ability to do good.
>> 
>> Could open source MT be such a strategic investment? I don't know, but
>> I'd like to at least raise the question. I think the alternative will
>> be, for the foreseeable future, to accept that this piece of
>> technology will be proprietary, and to rely on goodwill for any
>> integration that concerns Wikimedia. Not the worst outcome, but also
>> not the best one.
>> 
>> Are there open source MT efforts that are close enough to merit
>> scrutiny? In order to be able to provide high quality result, you
>> would need not only a motivated, well-intentioned group of people, but
>> some of the smartest people in the field working on it.  I doubt we
>> could more than kickstart an effort, but perhaps financial backing at
>> significant scale could at least help a non-profit, open source effort
>> to develop enough critical mass to go somewhere.
>> 
>> All best,
>> Erik
>> 
>> [1] http://stats.wikimedia.org/wikimedia/animations/growth/AnimationProjectsGrowthWp.html
>> [2] https://developers.google.com/translate/v2/pricing
>> --
>> Erik Möller
>> VP of Engineering and Product Development, Wikimedia Foundation
>> 
>> Wikipedia and our other projects reach more than 500 million people every
>> month. The world population is estimated to be >7 billion. Still a long
>> way to go. Support us. Join us. Share: https://wikimediafoundation.org/
>> 
>> _______________________________________________
>> Wikimedia-l mailing list
>> Wikimedia-l at lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
> 
> 
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l



More information about the Wikimedia-l mailing list