On Fri, Jul 26, 2013 at 3:25 PM, David Cuenca dacuetu@gmail.com wrote:
This is the preliminary draft:
https://meta.wikimedia.org/wiki/Collaborative_Machine_Translation_for_Wikipe...
The linked page says:
For this kind of project it is prefered to use a rule-based machine translationhttps://en.wikipedia.org/wiki/en:Rule-based_machine_translation system, because total control is wanted over the whole process and minority languages should be accounted for (not that easy with statistical-basedhttps://en.wikipedia.org/wiki/en:Statistical_machine_translation MT, where parallel corpora may be non-existing).
This statement seems rather defeatist to me. Step one of a machine translation effort should be to provide tools to annotate parallel texts in the various wikis, and to edit and maintain their parallelism. Once this is done, you have a substantial parallel corpora, which is then suitable to grow the set of translated articles. That is, minority languages ought to be accounted for by progressively expanding the number of translated articles in their encyclopedia, as we do now. As this is done, machine translation incrementally improves. If there is not enough of an editor community to translate articles, I don't see how you will succeed in the much more technically-demanding tasks of creating rules for a rule-based translation system. The beauty of the statistical approach is that little special ability is needed. --scott