[Wikimedia-l] The case for supporting open source machine translation

Milos Rancic millosh at gmail.com
Fri Apr 26 18:20:52 UTC 2013


On Thu, Apr 25, 2013 at 4:26 PM, Denny Vrandečić
<denny.vrandecic at wikimedia.de> wrote:
> Not just bootstrapping the content. By having the primary content be saved
> in a language independent form, and always translating it on the fly, it
> would not merely bootstrap content in different languages, but it would
> mean that editors from different languages would be working on the same
> content. The texts in the different language is not a translation of each
> other, but they are all created from the same source. There would be no
> primacy of, say, English.

What we can is to make Simple English Wikipedia more useful and
rewrite rules from the Simple English language to the Controlled
English language and to allow filling the content of the smaller
Wikipedias from Simple English Wikipedia. That's the only way how to
get anything more useful than Google Translate output.

There are serious problems in relation to the "translation of
translation" process and that kind of complexity is not in the range
of contemporary science. (Basically, even good machine translation is
not in in the range contemporary science. Statistical approaches are
useful for getting basic understanding, but very bad for writing
encyclopedia or anything else which requires correct output in the
targeted language.)

On a much simpler scale of conversion engines, we can see that even 1%
of errors (or manual interventions) are serious issue for the text
integrity, while translations of translations are creating much more
errors, no matter would there be human interventions or not. And
that's not acceptable for average editor of the project in targeted
language.

Said so, we'd need serious linguistic work for every language added to
the system.

At the other side, I support Erik's intention to make free software
tool for machine translation. But note that it's just the second step
(Wikidata was the first one) on the long way.



More information about the Wikimedia-l mailing list