[Wikimediaindia-l] Philosophical view on Google translated articles

BalaSundaraRaman sundarbecse at yahoo.com
Thu Apr 22 07:05:34 UTC 2010


Forgot to mention something:

// You can start a page for feature requests (and feature requirements)
for this sort of translation -- and tell the Google translators (in
particular) that all translations /must/ adhere to a certain style or
format, or must be less invasive when an article already exists on the
topic.   (noone will continue a project if they know that its work is
going to be reverted or removed.) //

We've done precisely that. We added categories to group their volunteers, added a template to the translated articles, added another category for talkpages where we have given feedback for them so that they can monitor from one page, and one noticeboard for guidelines and requirements. In fact, we were hoping that this would help in setting the process for any such project in the future with other Wikipedias as well. While they made use of some of the feedback, as long as the articles being added afresh don't start meeting the guidelines, it's too much work for the Wikipedians to go after every such article.

Regards,
Sundar

 "That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted."
- George Boole, quoted in Iverson's Turing Award Lecture



----- Original Message ----
> From: BalaSundaraRaman <sundarbecse at yahoo.com>
> To: wikimediaindia-l at lists.wikimedia.org
> Sent: Thu, April 22, 2010 11:08:01 AM
> Subject: Re: [Wikimediaindia-l] Philosophical view on Google translated articles
> 
> Hi Samuel,

Thanks for the clarification. Good to know that the foundation 
> is in the know.

Ravi and I have acted as interlocutors with the Google 
> team for Tamil Wikipedia. We have exchanged several emails and have had one 
> conference call with the Google team. During these communications, we have 
> conveyed clear bullet-pointed requirements that are the bare minimum necessities 
> to meet our guidelines and are very much doable. Of these, to be fair, they did 
> address some of our issues, but not the most important ones.

The most 
> important of the issues stem from the pillars of Wikipedia and we absolutely 
> can't compromise on that. For Google, the required outcome is the number of 
> words in Indian languages SEOed from their query logs. For the translators, it's 
> the money that they'll get for each word translated. For Wikipedia, the basic 
> necessity is readable and meaningful content added through a process that 
> doesn't subvert the Wiki way.

Following is a summary:
1. The quality 
> is abysmal. Too mechanical and ungrammatical more than 50% of the time. [To set 
> the context for Samuel (who might mistake that it works like it does for 
> European languages), the toolkit is not anywhere ready for Indian languages and 
> doesn't do any translation as such, it's the translators who do that and it's 
> unimaginable that a native speaker writes those words, not sentences.]
2. The 
> process is hands-off, the translators don't even read the page that they've 
> dumped. 
3. The pages are broken with infinite erroneous redlinks and missing 
> templates due to an easy-to-fix bug in the kit.
4. The basic premise of the 
> team is 'something's better than nothing'. It's not. Having no article on a 
> subject is better than having an unreadable text of 2000 words on that 
> subject.
5. Their process requirement: you can pick subjects, give 
> guidelines, but we can't guarantee anything. We don't carry any responsibility 
> to improve the articles once dumped and we don't want you to mess with them. Of 
> course, on the last point, they have come down. They agreed to have a look at 
> talk page feedback and only one translator (of nearly 20-30) has responded so 
> far. This is CLEARLY unacceptable and our editors have said it in as many 
> words.

I also request the community here and the foundation folks to 
> reflect on the policy issues: how can we let someone post articles of no 
> acceptable level which they won't edit further? Tomorrow, if a vandal does the 
> same, won't we block them? On top of this, they casually mentioned some sort of 
> agreement or contract with the foundation, but decline to give any information 
> regarding that. Either they don't get what Wikipedia is or they don't care about 
> it.

On a positive note, we still have our channel open with them and 
> we're going to propose that they approach universities or the Classical Tamil 
> Institute in Chennai who undertake such projects employing retired Tamil 
> professors and teachers. Also, carrying an obligation to fix issues before 
> adding new articles. If they can't do that, we don't have any other option 
> left.

- Sundar

"That language is an instrument of human reason, 
> and not merely a medium for the expression of thought, is a truth generally 
> admitted."
- George Boole, quoted in Iverson's Turing Award 
> Lecture



----- Original Message ----
> From: Samuel Klein 
> <
> href="mailto:meta.sj at gmail.com">meta.sj at gmail.com>
> To: 
> ymailto="mailto:wikimediaindia-l at lists.wikimedia.org" 
> href="mailto:wikimediaindia-l at lists.wikimedia.org">wikimediaindia-l at lists.wikimedia.org
> 
> Sent: Wed, April 21, 2010 11:41:16 PM
> Subject: [Wikimediaindia-l] Re: 
> Philosophical view on Google translated articles
> 
> 
> Hello,

My first post on this list, and a long one :-)  The topic of 
> 
> better
supporting small language Wikipedias is one that is close to 
> my 
> heart.

The foundation doesn't have any particular policy on 
> 
> third-party
translations or article-writing projects.   As 
> Achal says, 
> every
community is welcome to use translation tools or 
> not as they see 
> fit;
and to work with outside translation groups or 
> not as they see 
> fit.

Ravi's concerns are valid -- people 
> interested in translation as 
> a
whole may want to discuss some of 
> these issues on the foundation 
> and
translation mailing lists -- you 
> will find that there are 
> many
multilingual editors who are 
> interested in the good (and bad) uses 
> of
GTT and other translation 
> tools.


== on the use of automatic 
> translations 
> ==

Automatic translations can be useful as one arrow in the 
> 
> quiver of a
community of editors.  For instance, I find it helpful for 
> 
> translated
pages to have an automatic category, and a large cleanup 
> template 
> at
top, something like:
  "this page was 
> automatically translated by 
> [TOOL]
   from [permalink to 
> revision of article in another 
> language].
   It may need 
> cleanup to meet [[STYLE GUIDE|community 
> standards]]."

In the 
> case of Google and their Translation Toolkit, I 
> think it would
be 
> good for Wikipedians to give them strong feedback about how 
> 
> they
need to improve the tool for it to be more useful to 
> 
> Wikipedians.
(and, if it is more of a nuisance than a help, the community 
> 
> should be
clear that it is not helping.)


== On Google's 
> toolkit 
> and translation work ==

Google has been fairly 
> transparent about what 
> they are doing, and has
been in touch with 
> the Foundation on a few occasions 
> to ask for advice
on how to make 
> their tools more useful.   I encourage 
> them to ask the
local 
> communities directly for that advice... (however, they 
> have had
few 
> direct responses from those language-communities.  I 
> observed 
> this
directly on swahili wikipedia - there were a few general 
> 
> commnts about
the difficulties raised by GTT overwriting existing articles, 
> 
> but few
specific feature requests / recommendations / requirements 
> from 
> the
active swahili editors.)

You can start a page for 
> feature requests 
> (and feature requirements)
for this sort of 
> translation -- and tell the 
> Google translators (in
particular) that 
> all translations /must/ adhere to a 
> certain style or
format, or must 
> be less invasive when an article already 
> exists on the
topic.  
> (noone will continue a project if they know that 
> its work is
going 
> to be reverted or removed.)


> From: Srikanth 
> Ramakrishnan 
> <
> href="mailto:
> href="mailto:rsrikanth05 at gmail.com">rsrikanth05 at gmail.com">
> ymailto="mailto:rsrikanth05 at gmail.com" 
> href="mailto:rsrikanth05 at gmail.com">rsrikanth05 at gmail.com>
>
> 
> 
> I agree with Shiju and Ramesh. I tried it out for Hindi. And the 
> phrase  
> 'A
> fully charged battery' got translated to what 
> would mean a battery 
> that got
> charged [the court charged]. It 
> isn't all that accurate right 
> now, but it
> may improve. While to 
> a certain extent, it may seem like 
> Google is
> catalising 
> Localised content, you can clearly see that Google 
> might be
> 
> trying to gain Monopoly over Wikipedia as well.

I don't 
> think 
> they have any interest in gaining monopoly over
Wikipedia.  They 
> 
> are not storing the translated articles, only
publishing them to 
> 
> Wikipedia.  While they are storing the "translation
memory" 
> produced as 
> a result, they make that available under a free
license, 
> for other 
> translators or tools to use.


Google has carried 
> out similar projects 
> in Arabic and Swahili among
other 
> languges;  I helped with the recent 
> Swahili Wikipedia 
> Challenge,
which was supported by GTT (for participants who 
> wanted 
> to use the
toolkit to translate an article rather than writing one 
> 
> from scratch)
-- but the resulting articles were rated based on their 
> 
> usefulness, so
that poorly-translated articles did not rank 
> 
> highly.

That was a largely community-driven translation effort, with a 
> 
> contest
run and maintained by Swahili admins.
  
> 
> http://sw.wikipedia.org/wiki/WP:KWC

Cheers,
SJ
--
Samuel 
> 
> Klein      
> 
> http://meta.wikimedia.org/wiki/user:sj

_______________________________________________
Wikimediaindia-l 
> 
> mailing list

> href="mailto:
> ymailto="mailto:Wikimediaindia-l at lists.wikimedia.org" 
> href="mailto:Wikimediaindia-l at lists.wikimedia.org">Wikimediaindia-l at lists.wikimedia.org">
> ymailto="mailto:Wikimediaindia-l at lists.wikimedia.org" 
> href="mailto:Wikimediaindia-l at lists.wikimedia.org">Wikimediaindia-l at lists.wikimedia.org

> 
> href="
> target=_blank 
> >https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l" 
> 
> target=_blank 
> >
> href="https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l" 
> target=_blank 
> >https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

_______________________________________________
Wikimediaindia-l 
> mailing list

> href="mailto:Wikimediaindia-l at lists.wikimedia.org">Wikimediaindia-l at lists.wikimedia.org

> href="https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l" 
> target=_blank 
> >https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l



More information about the Wikimediaindia-l mailing list