[Foundation-l] Is Google translation is good for Wikipedias?
Ziko van Dijk
zvandijk at googlemail.com
Wed Jul 28 13:42:56 UTC 2010
Dear colleagues,
My experiences with the Translate Kit are negative, too. It happened
just too often that a sentence was so twisted that I did not
understand it. Checking it with the original took me a lot of time, so
I decided that doing the translation by myself is much quicker and
reliable. It is good for nobody to read Wikipedia articles in
gibberish.
The idea that the translation tool is doing the work and that a human
being has to make just some little corrections, has simply failed.
Especially negative was, to me, that the Translator kit encourages you
to translate sentence by sentence.
I don't want to do injustice to anyone, but in my view there are two
groups of Wikipedians:
- those who want to see huge article numbers and believe that any
article with any content is good, in any quality, and that the
Wikipedians are sufficient to do the rest.
- those who believe that (at least a minimum) quality is important and
that articles below a certain niveau do damage to a Wikipedia. The
small numbers of Wikipedians cannot cope with the work. They welcome
not any content, but content that meets the possible interests of
their readers.
It seems to me that the first group is mainly populated by computer
specialists and natives of English. The second group consists of
language specialists and non natives of English. But of course there
are many exceptions.
Kind regards
Ziko van Dijk
2010/7/28 Shiju Alex <shijualexonline at gmail.com>:
>>
>> We welcome automation in translation, but not at the expense of
>> introducing incorrect and messy content on wikipedia. We'd rather stay
>> small and hand-craft than allow an experimental tool and unskilled
>> paid translators creating a big mess.
>>
>
>
> Yes. This is the answer that you will get from most of the active wiki
> ((small wikis) communities where this project is going on. Many of the small
> wiki communities are not worried about the numbers as some big wikipedias
> do. Quality is more important for small wikis when number of contributors
> are less. *Many of us will use this quality matrix* itself to bring in more
> people.
>
> My real concern is about the rift that is happening in a language community
> due to this project. Issues of a language wiki is taken outside wiki to
> prove some points against its contributors. Two types are communities are
> evolving out of this project. *Google's Wiki community* and *Wiki's wiki
> community*. :) This is really annoying as far as small wikis are concerned.
>
> So, some sort of intervention is required to make sure this project run
> smootly on different wiikipedias.
>
>
> ~Shiju
>
>
> On Wed, Jul 28, 2010 at 1:38 AM, Ragib Hasan <ragibhasan at gmail.com> wrote:
>
>> As an admin in Bengali wikipedia, I had to deal with this issue a lot
>> (some of which were discussed with the Telegraph (India) newspaper
>> article). But I'd like to elaborate our stance here:
>>
>> (The tool used was Google Translation Toolkit. (not Google Translate).
>> There is a distinction between these two tools. Google Translation
>> Toolkit (GTT) is a translation-memory based semi-manual translation
>> tool. That is, it learns translation skills as you gradually translate
>> articles by hand. Later, this can be used to automate translation.)
>>
>> Issues:
>> 1. Community involvement: First of all, the local community was not at
>> all involved or informed about this project. All on a sudden, we found
>> new users signing up, dropping a large article on a random topic, and
>> move away. These users never responded to any talk page messages, so
>> we first assumed these were just random users experimenting with
>> wikipedia.
>>
>> Even now, no one from Google has contacted us in Bengali wikipedia and
>> inform us about Google's intentions. This is not a problem by itself,
>> but see the following points.
>>
>> 2. Translation quality: The quality of the translations was awful. The
>> translations added to Bengali wikipedia were artificial, dry, and used
>> obscure words and phrases. It looked as if a non-native speaker sat
>> down with a dictionary in hand, and mechanically translated each
>> sentence word by word. That led to sentences which are hard to
>> understand, or downright nonsensical.
>>
>> The articles were half-done. Numerals were not translated at all. The
>> punctuation symbol for Bengali language (the "danda" symbol: । ) was
>> not used. (apparently, GTT and/or the google transliteration tool does
>> not support that).
>>
>> The articles were also full of spelling mistakes. The paid translator
>> misspelled many simple words, or even used different spellings for the
>> same word in different parts of the article.
>>
>> Finally, different languages have different sentence structures.
>> Sometimes, a complex sentence is better expressed if broken up in two
>> sentences in another language. We found that the translators simply
>> translated sentences preserving their English language structure. This
>> caused the resulting Bengali sentences awkward and artificial to read.
>> For example, we do not write "If x then y" in Bengali just by
>> replacing if and then with the corresponding Bengali words. But the
>> translators did that, apparently this is an artifact of using GTT.
>>
>>
>> 3. Lack of follow up: When we found the above problems, naturally, we
>> asked the contributor to fix them. Got no reply. It is NOT the task of
>> volunteers to clean up the mess after the one-night-standish paid
>> translators. Given the small number of volunteers active at any given
>> moment, it will take enormous efforts in our part to go through these
>> articles and fix the punctuation, spelling, and grammar issues. Not to
>> mention the awkward language style used by the translators.
>>
>> So, after getting a cold shoulder from the paid translators about
>> fixing their mess, we had to ban such edits outright. We didn't know
>> who was behind this, until the Wikimania talk from Google. Not that it
>> matters ... even now, we won't allow these half done and badly
>> translated articles on bengali wikipedia.
>>
>> Bengali wikipedia is small (21k articles), but we do not want to
>> populate it overnight with badly translated content, some of which
>> won't even qualify as grammatically correct Bengali. While wikipedia
>> may be a perpetual work in progress, that does not mean we need to be
>> guinea-pigs of some careless experiments. So, our stance is, "Thanks,
>> but NO Thanks!". Unless, of course, they can put enough commitment
>> into the translations and fix mistakes.
>>
>> We welcome automation in translation, but not at the expense of
>> introducing incorrect and messy content on wikipedia. We'd rather stay
>> small and hand-craft than allow an experimental tool and unskilled
>> paid translators creating a big mess.
>>
>>
>> Thanks
>>
>> Ragib (User:Ragib on en and bn)
>>
>> --
>> Ragib Hasan, Ph.D
>> NSF Computing Innovation Fellow and
>> Assistant Research Scientist
>>
>> Dept of Computer Science
>> Johns Hopkins University
>> 3400 N Charles Street
>> Baltimore, MD 21218
>>
>> Website:
>> http://www.ragibhasan.com
>>
>>
>>
>>
>> On Sun, Jul 25, 2010 at 2:12 AM, Shiju Alex <shijualexonline at gmail.com>
>> wrote:
>> > Hello All,
>> >
>> > Recently there are lot of discussions (in this list also) regarding the
>> > translation project by Google for some of the big language wikipedias.
>> The
>> > foundation also seems like approved the efforts of Google. But I am not
>> sure
>> > whether any one is interested to consult the respective language
>> community
>> > to know their views.
>> >
>> > As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised
>> > their concerns about Google's project. But, does this means that other
>> > communities are happy about Google efforts? If there is no active
>> community
>> > in a wikipedia how can we expect response from communities? If there is
>> no
>> > response from a community, does that mean that Google can hire some
>> native
>> > speakers and use machine translation to create articles for that
>> wikipedia?
>> >
>> > Now let us go back to a basic question. Does WMF require a wiki community
>> to
>> > create wikipedia in any language? Or can they utilize the services of
>> > companies like Google to create wikipedias in N number of languages?
>> >
>> > One of the main point raised by the supporters of Google translation is
>> > that, Google's project is good *for the online version of the
>> language*.That
>> > might be true. But no body is cared to verify whether it is good for
>> > Wikipedia.
>> >
>> > As pointed out by Ravi in his presentation in Wikimania, (
>> > http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google
>> > translation of wikipedia articles:
>> >
>> > - will affect the biological growth of a Wikipedia article
>> > - will create copy of English wikipedia article in local wikis
>> > - it is against some of the basic philosophies of wikipedia
>> >
>> > The people outside wiki will definitely benefit from this tool, if Google
>> > translation tool is developed for each language. I saw the working
>> example
>> > of this in Poland during Wikimania, when some people who are not good in
>> > English used google translator to communicate with us. :)
>> >
>> > Apart from the points raised by Ravi in his presentation, this will
>> affect
>> > the community growth.If there is no active wiki community, how can we
>> expect
>> > them to look after all these junk articles uploaded to wiki every day.
>> When
>> > all the important article links are already turned blue, how we can
>> expect
>> > any future potential editors. So according to me, Google's project is
>> > killing the growth of an active wiki community.
>> >
>> > Of course, Tamil Wikipedia is trying to use Google project effectively.
>> But
>> > only Tamil is doing that since they have an active wiki community*. Many
>> > Wiki communities are not even aware that such a project is happening in
>> > their wiki*.
>> >
>> > I do not want to point out specific language wikipedas to prove my point.
>> > But visit the wikipedias (especially wikipedias* that use non-latin
>> scripts*)
>> > to view the status of google translation project. Loads of junk articles
>> > are uploaded to wiki every day. Most of the time the only edit in these
>> > articles is the edit by its creator and the inter language wiki bots.
>> >
>> > This effort will definitely affect community growth. Kindly see the
>> points
>> > raised by a Swahali
>> > Wikipedian<
>> http://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-the-swahili-wikipedia/
>> >.
>> > Many Swahali users (and other language users) now expect a laptop or some
>> > other monitory benefits to write in their wikipedia. That affects the
>> > community growth.
>> >
>> > So what is the solution for this? Can we take lessons from
>> > Tamil/Bengali/Swahili wikipedias and find methods to use this service
>> > effectively or continue with the current article creation process.
>> >
>> > One last question. Is this tool that is developing by Google is an open
>> > source tool? If not, we need to answer so many questions that may follow.
>> >
>> > Regards
>> >
>> > Shiju Alex
>> > http://en.wikipedia.org/wiki/User:Shijualex
>> > _______________________________________________
>> > foundation-l mailing list
>> > foundation-l at lists.wikimedia.org
>> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>> >
>>
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l at lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
--
Ziko van Dijk
Niederlande
More information about the foundation-l
mailing list