On Fri, Jun 21, 2013 at 2:54 PM, Daniel Kinzler <daniel.kinzler@wikimedia.de> wrote:
For literature grade translations, that is, for a true dictionary, I believe
that you need to full range of nuances attached to each word and each word
sense, which is distinct from the platonic concepts described by data items.

Literature grade translations require a knowledge of both cultural domains and context, sometimes there is no correspondence between concepts. This is also an amazing quote:
"Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six [more] hours of work. There are ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoner of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia." -- Claude Piron, Le défi des langues (The Language Challenge)

However we have a wonderful situation, because:
1) Wikipedia is not a literary work, so the translation requirements are not that high.
2) It has a lot of users that can manually disambiguate the source text with semantic annotations, and users in the target language that can fill the gaps
3) There is prior work done in the RBMT open source world, so there is no need to start from scratch
4) A translation portal for the wiki world already exists and it is going to be expanded

Basically almost all the blocks needed to create a powerful MT system for WP are already there or waiting to be integrated. What I believe is missing it is the model for storing structurally the morphological information from Wiktionary templates so the data becomes machine readable and usable. It will require some prior work to create a coherent model based on the expression/sense entity types. Doable with some intense full-time dedication :)
