Forgot to mention something:
// You can start a page for feature requests (and feature requirements)
for this sort of translation -- and tell the Google translators (in
particular) that all translations /must/ adhere to a certain style or
format, or must be less invasive when an article already exists on the
topic. (noone will continue a project if they know that its work is
going to be reverted or removed.) //
We've done precisely that. We added categories to group their volunteers, added a
template to the translated articles, added another category for talkpages where we have
given feedback for them so that they can monitor from one page, and one noticeboard for
guidelines and requirements. In fact, we were hoping that this would help in setting the
process for any such project in the future with other Wikipedias as well. While they made
use of some of the feedback, as long as the articles being added afresh don't start
meeting the guidelines, it's too much work for the Wikipedians to go after every such
article.
Regards,
Sundar
"That language is an instrument of human reason, and not merely a medium for the
expression of thought, is a truth generally admitted."
- George Boole, quoted in Iverson's Turing Award Lecture
----- Original Message ----
From: BalaSundaraRaman <sundarbecse(a)yahoo.com>
To: wikimediaindia-l(a)lists.wikimedia.org
Sent: Thu, April 22, 2010 11:08:01 AM
Subject: Re: [Wikimediaindia-l] Philosophical view on Google translated articles
Hi Samuel,
Thanks for the clarification. Good to know that the foundation
is in the know.
Ravi and I have acted as interlocutors with the Google
team for Tamil Wikipedia. We have exchanged several
emails and have had one
conference call with the Google team. During these communications, we have
conveyed clear bullet-pointed requirements that are the bare minimum necessities
to meet our guidelines and are very much doable. Of these, to be fair, they did
address some of our issues, but not the most important ones.
The most
important of the issues stem from the pillars of
Wikipedia and we absolutely
can't compromise on that. For Google, the required outcome is the number of
words in Indian languages SEOed from their query logs. For the translators, it's
the money that they'll get for each word translated. For Wikipedia, the basic
necessity is readable and meaningful content added through a process that
doesn't subvert the Wiki way.
Following is a summary:
1. The quality
is abysmal. Too mechanical and ungrammatical more than
50% of the time. [To set
the context for Samuel (who might mistake that it works like it does for
European languages), the toolkit is not anywhere ready for Indian languages and
doesn't do any translation as such, it's the translators who do that and it's
unimaginable that a native speaker writes those words, not sentences.]
2. The
process is hands-off, the translators don't even
read the page that they've
dumped.
3. The pages are broken with infinite erroneous redlinks and missing
templates due to an easy-to-fix bug in the kit.
4. The basic premise of the
team is 'something's better than nothing'.
It's not. Having no article on a
subject is better than having an unreadable text of 2000 words on that
subject.
5. Their process requirement: you can pick subjects, give
guidelines, but we can't guarantee anything. We
don't carry any responsibility
to improve the articles once dumped and we don't want you to mess with them. Of
course, on the last point, they have come down. They agreed to have a look at
talk page feedback and only one translator (of nearly 20-30) has responded so
far. This is CLEARLY unacceptable and our editors have said it in as many
words.
I also request the community here and the foundation folks to
reflect on the policy issues: how can we let someone
post articles of no
acceptable level which they won't edit further? Tomorrow, if a vandal does the
same, won't we block them? On top of this, they casually mentioned some sort of
agreement or contract with the foundation, but decline to give any information
regarding that. Either they don't get what Wikipedia is or they don't care about
it.
On a positive note, we still have our channel open with them and
we're going to propose that they approach
universities or the Classical Tamil
Institute in Chennai who undertake such projects employing retired Tamil
professors and teachers. Also, carrying an obligation to fix issues before
adding new articles. If they can't do that, we don't have any other option
left.
- Sundar
"That language is an instrument of human reason,
and not merely a medium for the expression of thought,
is a truth generally
admitted."
- George Boole, quoted in Iverson's Turing Award
Lecture
----- Original Message ----
From: Samuel Klein
<
href="mailto:meta.sj@gmail.com">meta.sj@gmail.com>
To:
ymailto="mailto:wikimediaindia-l@lists.wikimedia.org"
href="mailto:wikimediaindia-l@lists.wikimedia.org">wikimediaindia-l@lists.wikimedia.org
Sent: Wed, April 21, 2010 11:41:16 PM
Subject: [Wikimediaindia-l] Re:
Philosophical view on Google translated articles
Hello,
My first post on this list, and a long one :-) The topic of
better
supporting small language Wikipedias is one that is close to
my
heart.
The foundation doesn't have any particular policy on
third-party
translations or article-writing projects. As
Achal says,
every
community is welcome to use translation tools or
not as they see
fit;
and to work with outside translation groups or
not as they see
fit.
Ravi's concerns are valid -- people
interested in translation as
a
whole may want to discuss some of
these issues on the foundation
and
translation mailing lists -- you
will find that there are
many
multilingual editors who are
interested in the good (and bad) uses
of
GTT and other translation
tools.
== on the use of automatic
translations
==
Automatic translations can be useful as one arrow in the
quiver of a
community of editors. For instance, I find it helpful for
translated
pages to have an automatic category, and a large cleanup
template
at
top, something like:
"this page was
automatically translated by
[TOOL]
from [permalink to
revision of article in another
language].
It may need
cleanup to meet [[STYLE GUIDE|community
standards]]."
In the
case of Google and their Translation Toolkit, I
think it would
be
good for Wikipedians to give them strong feedback
about how
they
need to improve the tool for it to be more useful to
Wikipedians.
(and, if it is more of a nuisance than a help, the community
should be
clear that it is not helping.)
== On Google's
toolkit
and translation work ==
Google has been fairly
transparent about what
they are doing, and has
been in touch with
the Foundation on a few occasions
to ask for advice
on how to make
their tools more useful. I encourage
them to ask the
local
communities directly for that advice... (however, they
have had
few
direct responses from those language-communities. I
observed
this
directly on swahili wikipedia - there were a few general
commnts about
the difficulties raised by GTT overwriting existing articles,
but few
specific feature requests / recommendations / requirements
from
the
active swahili editors.)
You can start a page for
feature requests
(and feature requirements)
for this sort of
translation -- and tell the
Google translators (in
particular) that
all translations /must/ adhere to a
certain style or
format, or must
be less invasive when an article already
exists on the
topic.
(noone will continue a project if they know that
its work is
going
to be reverted or removed.)
From: Srikanth
Ramakrishnan
<
href="mailto:
href="mailto:rsrikanth05@gmail.com">rsrikanth05@gmail.com">
ymailto="mailto:rsrikanth05@gmail.com"
href="mailto:rsrikanth05@gmail.com">rsrikanth05@gmail.com>
I agree with Shiju and Ramesh. I tried it out for Hindi. And the
phrase
'A
fully charged battery' got translated to what
would mean a battery
that got
charged [the court charged]. It
isn't all that accurate right
now, but it
may improve. While to
a certain extent, it may seem like
Google is
catalising
Localised content, you can clearly see that Google
might be
trying to gain Monopoly over Wikipedia as well.
I don't
think
they have any interest in gaining monopoly over
Wikipedia. They
are not storing the translated articles, only
publishing them to
Wikipedia. While they are storing the "translation
memory"
produced as
a result, they make that available under a free
license,
for other
translators or tools to use.
Google has carried
out similar projects
in Arabic and Swahili among
other
languges; I helped with the recent
Swahili Wikipedia
Challenge,
which was supported by GTT (for participants who
wanted
to use the
toolkit to translate an article rather than writing one
from scratch)
-- but the resulting articles were rated based on their
usefulness, so
that poorly-translated articles did not rank
highly.
That was a largely community-driven translation effort, with a
contest
run and maintained by Swahili admins.
Cheers,
SJ
--
Samuel
_______________________________________________
Wikimediaindia-l
mailing list
href="mailto:
ymailto="mailto:Wikimediaindia-l@lists.wikimedia.org"
href="mailto:Wikimediaindia-l@lists.wikimedia.org">Wikimediaindia-l@lists.wikimedia.org">
ymailto="mailto:Wikimediaindia-l@lists.wikimedia.org"
href="mailto:Wikimediaindia-l@lists.wikimedia.org">Wikimediaindia-l@lists.wikimedia.org
href="
target=_blank
target=_blank
href="https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l&qu…
target=_blank
>https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
_______________________________________________
Wikimediaindia-l
mailing list
href="mailto:Wikimediaindia-l@lists.wikimedia.org">Wikimediaindia-l@lists.wikimedia.org
href="https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l&qu…
target=_blank
>https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l