[Wikimedia-l] The case for supporting open source machine translation

Tim Starling tstarling at wikimedia.org
Thu Apr 25 23:30:22 UTC 2013


On 24/04/13 16:29, Erik Moeller wrote:
> Are there open source MT efforts that are close enough to merit
> scrutiny? In order to be able to provide high quality result, you
> would need not only a motivated, well-intentioned group of people, but
> some of the smartest people in the field working on it.  I doubt we
> could more than kickstart an effort, but perhaps financial backing at
> significant scale could at least help a non-profit, open source effort
> to develop enough critical mass to go somewhere.

We could basically clone the frontend component of Google Translate,
and use Moses as a backend. The work would be mostly JavaScript, which
we can do. When VisualEditor wraps up, we'll have several JavaScript
developers looking for a project.

Google Translate gathers its own parallel corpus, and does it in a way
that's accessible to non-technical bilingual speakers, so I think it's
a nice model. The quality of its translations has improved enormously
over the years, and I suppose most of that change is due to improved
training data.

If we develop it as a public-facing open source product, then other
Moses users could start using it. We could host it on GitHub, so that
if it turns out to be popular, we could let it gradually evolve away
from WMF control.

Once the frontend tool is done, the next job would be to develop a
corpus sharing site, hosting any available freely-licensed output of
the frontend tool.

-- Tim Starling




More information about the Wikimedia-l mailing list