I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
translate "banana" from english to czech
I think that we could (maybe should in spirit of openness and wikiness) have some wiki-based web application that would serve this purpose - allow people query / translate simple words, but maybe even whole phrases. If anyone could edit this, maybe it would grow up into huge dictionary of all possible or frequent phrases that could be easily translated to any language on world.
Do we already have anything like this?
On Thu, May 22, 2014 at 5:41 PM, Petr Bena benapetr@gmail.com wrote:
I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
translate "banana" from english to czech
I think that we could (maybe should in spirit of openness and wikiness) have some wiki-based web application that would serve this purpose - allow people query / translate simple words, but maybe even whole phrases. If anyone could edit this, maybe it would grow up into huge dictionary of all possible or frequent phrases that could be easily translated to any language on world.
Do we already have anything like this?
It doesn't exist yet but it is on the longer-term (aka 2015 earliest) plan for the Wikidata team. The current proposal is at https://www.wikidata.org/wiki/Wikidata:Wiktionary
Cheers Lydia
I am happy to know that we are doing at least "something" on this :) hopefully a first step to some more complex solution? Because from the proposal you linked I can't see how would I easily translate "apple" to different language. I know I can perform a number of lookups and queries to accomplish that, but IMHO it should be easier.
On Thu, May 22, 2014 at 5:47 PM, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
On Thu, May 22, 2014 at 5:41 PM, Petr Bena benapetr@gmail.com wrote:
I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
translate "banana" from english to czech
I think that we could (maybe should in spirit of openness and wikiness) have some wiki-based web application that would serve this purpose - allow people query / translate simple words, but maybe even whole phrases. If anyone could edit this, maybe it would grow up into huge dictionary of all possible or frequent phrases that could be easily translated to any language on world.
Do we already have anything like this?
It doesn't exist yet but it is on the longer-term (aka 2015 earliest) plan for the Wikidata team. The current proposal is at https://www.wikidata.org/wiki/Wikidata:Wiktionary
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, May 22, 2014 at 5:59 PM, Petr Bena benapetr@gmail.com wrote:
I am happy to know that we are doing at least "something" on this :) hopefully a first step to some more complex solution? Because from the proposal you linked I can't see how would I easily translate "apple" to different language. I know I can perform a number of lookups and queries to accomplish that, but IMHO it should be easier.
Yes this is the groundwork for potentially more complex translation systems later. If/when/how that'll be done I have no idea. But this is the next step on the way ;-)
Cheers Lydia
Just to extend the idea little bit so that it's easier to answer "do we have this?" (I am pretty sure we don't):
This service should be able to do things like this:
TRANSLATE hello there, how are you FROM english TO chinese (preudo-query language is just for this example so that it's clear what I want it to do)
and it /should/
1. look up whole sentence "hello there, how are you" in database, if there is no translation for whole this sentence, it should: 2. split the sentence (by comma) and look only for "hello there" and "how are you", if there is no translation for these it should: 3. split it by words and return "mechanic" translation for every word (which is least wanted but better than nothing)
if people had possibility to insert & translate words, phrases, sentences, I think this would be awesome application as lot of people would probably insert incredible amount of data and translations.
I don't really know if this is something what wikimedia movement should provide or support, but anyway, it would be nice to have open source project :) I know it would be kind of reinventing of google translate, but that, no matter how nice it is, isn't free for developers (api's are paid) and isn't very open (source code is closed and user ability to edit database is nowhere near to what people can do on real wikis, like wikipedia)
On Thu, May 22, 2014 at 5:41 PM, Petr Bena benapetr@gmail.com wrote:
I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
translate "banana" from english to czech
I think that we could (maybe should in spirit of openness and wikiness) have some wiki-based web application that would serve this purpose - allow people query / translate simple words, but maybe even whole phrases. If anyone could edit this, maybe it would grow up into huge dictionary of all possible or frequent phrases that could be easily translated to any language on world.
Do we already have anything like this?
On 05/22/2014 05:41 PM, Petr Bena wrote:
I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
There are several open-source machine translation projects. They are either rule-based or statistics-based. One of the rule-based projects is Apertium.
When you start from zero, building a rule-based system gives you a useful system quite fast, especially if the two languages are similar. A statistics-based system (such as Google Translate) requires enormous amounts of data to become useful.
It's not something that you can start as a subproject within Wiktionary, not even as a separate WMF project. It's a very large task.
One naive approach is to base a statistics-based machine translator (SMT) on the European Union's freely available parallel text corpus. When you try to translate Finnish "terve" (which means: hello!) into English in such a system, it will say "health", since the same word also means health, and EU texts only talk about healthcare, never "hello".
If you Petr were going to take a rules' based approach to what you've outlined above, and use the already existing Wikidata interlinguality, which I think is based around the 'item with a label' (think a Wikipedia Encyclopedia article - is this correct?), and build on Wiktionary, could one 'reduce' Wikidata's intelinguality from an 'item' to a 'word' (and also co-anticipate voice, smartphones, and extensibility / scalability to all 7,106+ languages, for example, as well)? What else would be needed, and what would some of the initial challenges to beginning this way?
Cheers, Scott
(I write the above in the context of developing wiki CC MIT OCW-centric WUaS for free online university degrees, and which plans to be in all 7106+ languages http://worlduniversity.wikia.com/wiki/Languages as schools, and develop a universal translator - http://worlduniversity.wikia.com/wiki/WUaS_Universal_Translator - as well).
On Thu, May 22, 2014 at 9:03 AM, Lars Aronsson lars@aronsson.se wrote:
On 05/22/2014 05:41 PM, Petr Bena wrote:
I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
There are several open-source machine translation projects. They are either rule-based or statistics-based. One of the rule-based projects is Apertium.
When you start from zero, building a rule-based system gives you a useful system quite fast, especially if the two languages are similar. A statistics-based system (such as Google Translate) requires enormous amounts of data to become useful.
It's not something that you can start as a subproject within Wiktionary, not even as a separate WMF project. It's a very large task.
One naive approach is to base a statistics-based machine translator (SMT) on the European Union's freely available parallel text corpus. When you try to translate Finnish "terve" (which means: hello!) into English in such a system, it will say "health", since the same word also means health, and EU texts only talk about healthcare, never "hello".
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
this isn't about translation of content of current wikimedia projects, but more about creating a generic tool that anyone could use to translate anything, so not really what [[Content translation]] describes
On Thu, May 22, 2014 at 6:39 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
This is currently being developed:
https://www.mediawiki.org/wiki/Content_translation
It will provide all the tools needed to translate wiki articles, including dictionary lookup. The back-end service interfaces will be fairly generic & will use open source tools like dictd and apertium, so might be useful for non-wiki projects.
Yes, this statistics based system would be more like what I meant, but keep in mind that if it was open, so that anyone could contribute on that database, just like wikipedia is, it would probably collect enormous amount of data pretty quickly, just as wikipedia did.
On Thu, May 22, 2014 at 6:03 PM, Lars Aronsson lars@aronsson.se wrote:
A statistics-based system (such as Google Translate) requires enormous amounts of data to become useful.
It's not something that you can start as a subproject within Wiktionary, not even as a separate WMF project. It's a very large task.
Great ... looks like MediaWiki Content translation and Wiktionary may provide another important approach to a possible Universal Translator ... :)
Scott
On Thu, May 22, 2014 at 9:48 AM, Petr Bena benapetr@gmail.com wrote:
this isn't about translation of content of current wikimedia projects, but more about creating a generic tool that anyone could use to translate anything, so not really what [[Content translation]] describes
On Thu, May 22, 2014 at 6:39 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
This is currently being developed:
https://www.mediawiki.org/wiki/Content_translation
It will provide all the tools needed to translate wiki articles,
including
dictionary lookup. The back-end service interfaces will be fairly
generic &
will use open source tools like dictd and apertium, so might be useful
for
non-wiki projects.
Yes, this statistics based system would be more like what I meant, but keep in mind that if it was open, so that anyone could contribute on that database, just like wikipedia is, it would probably collect enormous amount of data pretty quickly, just as wikipedia did.
On Thu, May 22, 2014 at 6:03 PM, Lars Aronsson lars@aronsson.se wrote:
A statistics-based system (such as Google Translate) requires enormous amounts of data to become useful.
It's not something that you can start as a subproject within Wiktionary, not even as a separate WMF project. It's a very large task.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
There exists more free and openly accessible parallell texts beside the EU ones. One bigger project is OPUS[1], which contains free software translations and subtitles for example.
Another kind of text that is suitable for statistical machine translation is comparable texts. They are texts written about the same thing, but not necessary translation of each other. This kind of text is harder to align into a translation dictionary model, but this kind of texts might be easier to find. From one point of view, the whole Wikipedia with it's language links can be seen as a huge corpus of comparable texts. There exists free tools for aligning comparable texts, one that pops into mind right now is Yalign[2], [3]. Another source for comparable texts is news articles about the same event.
Best wishes! Kristian
[1] http://opus.lingfil.uu.se/ [2] http://yalign.machinalis.com/ [3] https://github.com/machinalis/yalign
22.05.2014 19:03, Lars Aronsson kirjutas:
On 05/22/2014 05:41 PM, Petr Bena wrote:
I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
There are several open-source machine translation projects. They are either rule-based or statistics-based. One of the rule-based projects is Apertium.
When you start from zero, building a rule-based system gives you a useful system quite fast, especially if the two languages are similar. A statistics-based system (such as Google Translate) requires enormous amounts of data to become useful.
It's not something that you can start as a subproject within Wiktionary, not even as a separate WMF project. It's a very large task.
One naive approach is to base a statistics-based machine translator (SMT) on the European Union's freely available parallel text corpus. When you try to translate Finnish "terve" (which means: hello!) into English in such a system, it will say "health", since the same word also means health, and EU texts only talk about healthcare, never "hello".
On 05/22/2014 08:41 AM, Petr Bena wrote:
I was looking for a free (possibly open source) provider of automatic translations for my open source application I am working on and quite had troubles finding some. Then I realized we have a project called "wiktionary" which could possibly (I was assuming it's open dictionary) help me here, but I was quite disappointed as I couldn't find any simple way to perform simple queries like:
translate "banana" from english to czech
I think that we could (maybe should in spirit of openness and wikiness) have some wiki-based web application that would serve this purpose - allow people query / translate simple words, but maybe even whole phrases. If anyone could edit this, maybe it would grow up into huge dictionary of all possible or frequent phrases that could be easily translated to any language on world.
Do we already have anything like this?
This is currently being developed:
https://www.mediawiki.org/wiki/Content_translation
It will provide all the tools needed to translate wiki articles, including dictionary lookup. The back-end service interfaces will be fairly generic & will use open source tools like dictd and apertium, so might be useful for non-wiki projects.
You can also use existing commercial APIs of course.
Gabriel
wikitech-l@lists.wikimedia.org