On 05/22/2014 05:41 PM, Petr Bena wrote:
I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
There are several open-source machine translation projects. They are either rule-based or statistics-based. One of the rule-based projects is Apertium.
When you start from zero, building a rule-based system gives you a useful system quite fast, especially if the two languages are similar. A statistics-based system (such as Google Translate) requires enormous amounts of data to become useful.
It's not something that you can start as a subproject within Wiktionary, not even as a separate WMF project. It's a very large task.
One naive approach is to base a statistics-based machine translator (SMT) on the European Union's freely available parallel text corpus. When you try to translate Finnish "terve" (which means: hello!) into English in such a system, it will say "health", since the same word also means health, and EU texts only talk about healthcare, never "hello".