[Wikimedia-l] The case for supporting open source machine translation

Jane Darnell jane023 at gmail.com
Fri Apr 26 12:04:19 UTC 2013


We already have the translation options on the left side of the screen
in any Wikipedia article.
This choice is generally a smattering of languages, and a long term
goal for many small-language Wikipedias is to be able to translate an
article from related languages (say from Dutch into Frisian, where the
Frisian Wikipedia has no article at all on the title subject) and the
even longer-term goal is to translate into some other
really-really-really foreign language.

Wouldn't it be easier however, to start with a project that uses
translatewiki and the related-language pairs? Usually there is a big
difference in numbers of articles (like between the Dutch Wikipedia
and the Frisian Wikipedia). Presumably the demand is larger on the
destination wikipedia (because there are fewer articles in those
languages), and the potential number of human translators is larger
(because most editors active in the smaller Wikipedia are versed in
both langages).

The Dutch Wikimedia chapter took part in a European multilingual
synchronization tool project called CoSyne:
http://cosyne.eu/index.php/Main_Page

It was not a success, because it was hard to figure out how this would
be beneficial to Wikipedians actually joining the project. Some
funding that was granted to the chapter to work on the project will be
returned, because it was never spent.

In order to tackle this problem on a large scale, it needs to be
broken down into words, sentences, paragraphs and perhaps other
structures (category trees?). I think CoSyne was trying to do this. I
think it would be easier to keep the effort in one-way-traffic, so try
to offer machine translation from Dutch to Frisian and not the other
way around, and then as you go, define concepts that work both ways,
so that eventually it would be possible to translated from Frisian
into Dutch.

2013/4/26, Mathieu Stumpf <psychoslave at culture-libre.org>:
> Le 2013-04-25 20:56, Theo10011 a écrit :
>> As far as Linguistic typology goes, it's far too unique and too
>> varied to
>> have a language independent form develop as easily. Perhaps it also
>> depends
>> on the perspective. For example, the majority of people commenting
>> here
>> (Americans, Europeans) might have exposure to a limited set of a
>> linguistic
>> branch. Machine-translations as someone pointed out, are still not
>> preferred in some languages, even with years of research and
>> potentially
>> unlimited resources at Google's disposal, they still come out
>> sounding
>> clunky in some ways. And perhaps they will never get to the level of
>> absolute, where they are truly language independent.
>
> To my mind, there's no such thing as "absolute" meaning. It's all about
> intrepretation in a given a context by a given interpreter. I mean, I do
> think that MT could probably be as good as a profesional translators.
> But even profesional translators can't make "perfect translations". I
> already gave the example of poetry, but you may also take example of
> humour, which ask for some cultural background, otherwise you have to
> explain why it's funny and you know that you have to explain a joke,
> it's not a joke.
>
>> If you read some of
>> the discussions in linguistic relativity (Sapir-Whorf hypothesis),
>> there is
>> research to suggest that a language a person is born with dictates
>> their
>> thought processes and their view of the world - there might not be
>> absolutes when it comes to linguistic cognition. There is something
>> inherently unique in the cognitive patterns of different languages.
>
> That's just how learning process work, you can't "understand" something
> you didn't experiment. Reading an algorithm won't give you the insight
> you'll get when you process it mentaly (with the help of pencil and
> paper) and a textual description of "making love" won't provide you the
> feeling it provide.
>
>
>> Which brings me to the point, why not English? Your idea seems
>> plausible
>> enough even if your remove the abstract idea of complete language
>> universality, without venturing into the science-fiction labyrinth of
>> man-machine collaboration.
>
> English have many so called "non-neutral" problems. As far as I know,
> if the goal is to use syntactically unambiguous human language, lojban
> is the best current candidate. English as an international language is a
> very harmful situation. Believe it or not, but I sometime have to
> translate to English sentences which are written in French, because the
> writer was thinking with English idiomatic locution that he poorly
> translated to French, its native language in which it doesn't know the
> idiomatic locution. Even worst, I red people which where where using
> concepts with an English locution because they never matched it with the
> French locution that they know. And in the other way, I'm not sure that
> having millions of people speaking a broken English is a wonderful
> situation for this language.
>
> Search "why not english as international language" if you need more
> documentation.
>
> --
> Association Culture-Libre
> http://www.culture-libre.org/
>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>



More information about the Wikimedia-l mailing list