On Sun, 25 Jul 2010 18:10:54 +0300, Amir E. Aharoni wrote:
2010/7/25 Shiju Alex shijualexonline@gmail.com:
Hello All,
Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.
At the same session at Wikimania a very sensible approach was presented by Mikel Iturbe from the Basque Wikipedia:
- They didn't use Google Translate, but an academically-developed tool,
which also happened to be Free Software - which diminished the arguments about commercialization.
Probably Matxin (http://sourceforge.net/projects/matxin/)
Matxin is somewhat related to Apertium, which I am involved with. Some Apertium developers tried to make it less Basque-specific, but weren't entirely successful.
The editors community was involved throughout the whole process.
Articles were not uploaded without correcting mistakes that the
translation software made.
- What's also important, the corrections were reported to the
translation software developers, so they would try to improve it.
Of course, not every language community can afford developing Free-as-in-speech academic translation software, but the other points are useful to everybody.
Depending on the languages involved, the amount of resources available for those languages, and having realistic expectations, a usable system can be made in as little as 3-6 months by a single motivated volunteer, with help from experienced developers. Earlier this year, at the request of Crisis Commons, 3 of us built a Haitian Creole to English prototype in less than a week.
Staying motivated is *hard*. We have 2-3 times as many half-working prototypes as we have released language pairs. Having realistic expectations is hard. People want English, and/or they want to include *everything* (budget at least a year of full time work for anything to English).
If you know the difference between noun, adjective, and verb, understand Zipf's law, and want open source MT for a pair of languages, come find us on #apertium on FreeNode. We'll be happy to help.
Mikel Iturbe's presentation:
The academic papers related to that project: * http://ixa.si.ehu.es/openmt2/argitalpenak_html * http://ixa.si.ehu.es/Ixa/Argitalpenak/Artikuluak/index_html?
Atala=Artikulua_Itzulpen_automatikoa