If you Petr were going to take a rules' based approach to what you've outlined above, and use the already existing Wikidata interlinguality, which I think is based around the 'item with a label' (think a Wikipedia Encyclopedia article - is this correct?), and build on Wiktionary, could one 'reduce' Wikidata's intelinguality from an 'item' to a 'word' (and also co-anticipate voice, smartphones, and extensibility / scalability to all 7,106+ languages, for example, as well)? What else would be needed, and what would some of the initial challenges to beginning this way?
Cheers, Scott
(I write the above in the context of developing wiki CC MIT OCW-centric WUaS for free online university degrees, and which plans to be in all 7106+ languages http://worlduniversity.wikia.com/wiki/Languages as schools, and develop a universal translator - http://worlduniversity.wikia.com/wiki/WUaS_Universal_Translator - as well).
On Thu, May 22, 2014 at 9:03 AM, Lars Aronsson lars@aronsson.se wrote:
On 05/22/2014 05:41 PM, Petr Bena wrote:
I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
There are several open-source machine translation projects. They are either rule-based or statistics-based. One of the rule-based projects is Apertium.
When you start from zero, building a rule-based system gives you a useful system quite fast, especially if the two languages are similar. A statistics-based system (such as Google Translate) requires enormous amounts of data to become useful.
It's not something that you can start as a subproject within Wiktionary, not even as a separate WMF project. It's a very large task.
One naive approach is to base a statistics-based machine translator (SMT) on the European Union's freely available parallel text corpus. When you try to translate Finnish "terve" (which means: hello!) into English in such a system, it will say "health", since the same word also means health, and EU texts only talk about healthcare, never "hello".
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l