[Wikimedia-l] The case for supporting open source machine translation

Jane Darnell jane023 at gmail.com
Sat Apr 27 08:53:04 UTC 2013


Just the thought of synchronizing Wikis makes me shudder. I think this
was also the reason that no Wikipedian editors were attracted to the
CoSyne project, though as it was explained to me the idea was that
only sections of a "source" Wikipedia article would be translated that
did not exist yet in the target article. This may be useful in the
case of a large article being the source, and a stub being the target,
but in the case where the source and the target are about equal size,
it could lead to a major mess.

In the example of the Wikipedia article on Haarlem, I noticed many of
the things lacking in the English version are things more relevant to
local people reading the Dutch version, such as local mass transit
information. The other way around, the things in the English version
that are lacking in the Dutch version are items that seem obvious to
locals.

2013/4/27, Ryu Cheol <rcheol at gmail.com>:
> Thanks to Jane for introducing CoSyne. But I feel all the wikis do not want
> to be synchronized to certain wikis. Rather than having identical articles,
> I hope they would have their own articles. I hope I could have two more tabs
> at right of the 'Article' and 'Talk' on English Wikipedia for Korean
> language. The two tabs are 'Article in Korean' and 'Talk in Korean'. The
> translations would have same information in originals and any editing on an
> article or a talk in translation pages would go back to the originals. In
> this case they need to be synchronized precisely.
>
> I mean these are done in the scope of English Wikipedia, not related to
> Korean Wikipedia. But the Korean Wikipedia linked to the left side of a page
> would be benefited from the translations in English Wikipedia eventually
> when an Korean Wikipedia editor find a good part of English Wikipedia
> article could be inserted to Korean Wikipedia.
>
> You can find the merits of the exact Korean translation of English Wikipedia
> or the scheme of the exact translation of big Wikipedias. It will help you
> reach to more potential contributors. It will make the language barrier
> lower for those who want to contribute to a Wikipedia they do not speak very
> well. Also, It could provide the better aligned corpora and it could could
> track how human translators or reviewers improve the translations.
>
> Cheol
>
> On 2013. 4. 26., at 오후 9:04, Jane Darnell <jane023 at gmail.com> wrote:
>
>> We already have the translation options on the left side of the screen
>> in any Wikipedia article.
>> This choice is generally a smattering of languages, and a long term
>> goal for many small-language Wikipedias is to be able to translate an
>> article from related languages (say from Dutch into Frisian, where the
>> Frisian Wikipedia has no article at all on the title subject) and the
>> even longer-term goal is to translate into some other
>> really-really-really foreign language.
>>
>> Wouldn't it be easier however, to start with a project that uses
>> translatewiki and the related-language pairs? Usually there is a big
>> difference in numbers of articles (like between the Dutch Wikipedia
>> and the Frisian Wikipedia). Presumably the demand is larger on the
>> destination wikipedia (because there are fewer articles in those
>> languages), and the potential number of human translators is larger
>> (because most editors active in the smaller Wikipedia are versed in
>> both langages).
>>
>> The Dutch Wikimedia chapter took part in a European multilingual
>> synchronization tool project called CoSyne:
>> http://cosyne.eu/index.php/Main_Page
>>
>> It was not a success, because it was hard to figure out how this would
>> be beneficial to Wikipedians actually joining the project. Some
>> funding that was granted to the chapter to work on the project will be
>> returned, because it was never spent.
>>
>> In order to tackle this problem on a large scale, it needs to be
>> broken down into words, sentences, paragraphs and perhaps other
>> structures (category trees?). I think CoSyne was trying to do this. I
>> think it would be easier to keep the effort in one-way-traffic, so try
>> to offer machine translation from Dutch to Frisian and not the other
>> way around, and then as you go, define concepts that work both ways,
>> so that eventually it would be possible to translated from Frisian
>> into Dutch.
>>
>> 2013/4/26, Mathieu Stumpf <psychoslave at culture-libre.org>:
>>> Le 2013-04-25 20:56, Theo10011 a écrit :
>>>> As far as Linguistic typology goes, it's far too unique and too
>>>> varied to
>>>> have a language independent form develop as easily. Perhaps it also
>>>> depends
>>>> on the perspective. For example, the majority of people commenting
>>>> here
>>>> (Americans, Europeans) might have exposure to a limited set of a
>>>> linguistic
>>>> branch. Machine-translations as someone pointed out, are still not
>>>> preferred in some languages, even with years of research and
>>>> potentially
>>>> unlimited resources at Google's disposal, they still come out
>>>> sounding
>>>> clunky in some ways. And perhaps they will never get to the level of
>>>> absolute, where they are truly language independent.
>>>
>>> To my mind, there's no such thing as "absolute" meaning. It's all about
>>> intrepretation in a given a context by a given interpreter. I mean, I do
>>> think that MT could probably be as good as a profesional translators.
>>> But even profesional translators can't make "perfect translations". I
>>> already gave the example of poetry, but you may also take example of
>>> humour, which ask for some cultural background, otherwise you have to
>>> explain why it's funny and you know that you have to explain a joke,
>>> it's not a joke.
>>>
>>>> If you read some of
>>>> the discussions in linguistic relativity (Sapir-Whorf hypothesis),
>>>> there is
>>>> research to suggest that a language a person is born with dictates
>>>> their
>>>> thought processes and their view of the world - there might not be
>>>> absolutes when it comes to linguistic cognition. There is something
>>>> inherently unique in the cognitive patterns of different languages.
>>>
>>> That's just how learning process work, you can't "understand" something
>>> you didn't experiment. Reading an algorithm won't give you the insight
>>> you'll get when you process it mentaly (with the help of pencil and
>>> paper) and a textual description of "making love" won't provide you the
>>> feeling it provide.
>>>
>>>
>>>> Which brings me to the point, why not English? Your idea seems
>>>> plausible
>>>> enough even if your remove the abstract idea of complete language
>>>> universality, without venturing into the science-fiction labyrinth of
>>>> man-machine collaboration.
>>>
>>> English have many so called "non-neutral" problems. As far as I know,
>>> if the goal is to use syntactically unambiguous human language, lojban
>>> is the best current candidate. English as an international language is a
>>> very harmful situation. Believe it or not, but I sometime have to
>>> translate to English sentences which are written in French, because the
>>> writer was thinking with English idiomatic locution that he poorly
>>> translated to French, its native language in which it doesn't know the
>>> idiomatic locution. Even worst, I red people which where where using
>>> concepts with an English locution because they never matched it with the
>>> French locution that they know. And in the other way, I'm not sure that
>>> having millions of people speaking a broken English is a wonderful
>>> situation for this language.
>>>
>>> Search "why not english as international language" if you need more
>>> documentation.
>>>
>>> --
>>> Association Culture-Libre
>>> http://www.culture-libre.org/
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list
>>> Wikimedia-l at lists.wikimedia.org
>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>
>>
>> _______________________________________________
>> Wikimedia-l mailing list
>> Wikimedia-l at lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>
>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>



More information about the Wikimedia-l mailing list