Hello,
My first post on this list, and a long one :-) The topic of better supporting small language Wikipedias is one that is close to my heart.
The foundation doesn't have any particular policy on third-party translations or article-writing projects. As Achal says, every community is welcome to use translation tools or not as they see fit; and to work with outside translation groups or not as they see fit.
Ravi's concerns are valid -- people interested in translation as a whole may want to discuss some of these issues on the foundation and translation mailing lists -- you will find that there are many multilingual editors who are interested in the good (and bad) uses of GTT and other translation tools.
== on the use of automatic translations ==
Automatic translations can be useful as one arrow in the quiver of a community of editors. For instance, I find it helpful for translated pages to have an automatic category, and a large cleanup template at top, something like: "this page was automatically translated by [TOOL] from [permalink to revision of article in another language]. It may need cleanup to meet [[STYLE GUIDE|community standards]]."
In the case of Google and their Translation Toolkit, I think it would be good for Wikipedians to give them strong feedback about how they need to improve the tool for it to be more useful to Wikipedians. (and, if it is more of a nuisance than a help, the community should be clear that it is not helping.)
== On Google's toolkit and translation work ==
Google has been fairly transparent about what they are doing, and has been in touch with the Foundation on a few occasions to ask for advice on how to make their tools more useful. I encourage them to ask the local communities directly for that advice... (however, they have had few direct responses from those language-communities. I observed this directly on swahili wikipedia - there were a few general commnts about the difficulties raised by GTT overwriting existing articles, but few specific feature requests / recommendations / requirements from the active swahili editors.)
You can start a page for feature requests (and feature requirements) for this sort of translation -- and tell the Google translators (in particular) that all translations /must/ adhere to a certain style or format, or must be less invasive when an article already exists on the topic. (noone will continue a project if they know that its work is going to be reverted or removed.)
From: Srikanth Ramakrishnan rsrikanth05@gmail.com
I agree with Shiju and Ramesh. I tried it out for Hindi. And the phrase 'A fully charged battery' got translated to what would mean a battery that got charged [the court charged]. It isn't all that accurate right now, but it may improve. While to a certain extent, it may seem like Google is catalising Localised content, you can clearly see that Google might be trying to gain Monopoly over Wikipedia as well.
I don't think they have any interest in gaining monopoly over Wikipedia. They are not storing the translated articles, only publishing them to Wikipedia. While they are storing the "translation memory" produced as a result, they make that available under a free license, for other translators or tools to use.
Google has carried out similar projects in Arabic and Swahili among other languges; I helped with the recent Swahili Wikipedia Challenge, which was supported by GTT (for participants who wanted to use the toolkit to translate an article rather than writing one from scratch) -- but the resulting articles were rated based on their usefulness, so that poorly-translated articles did not rank highly.
That was a largely community-driven translation effort, with a contest run and maintained by Swahili admins. http://sw.wikipedia.org/wiki/WP:KWC
Cheers, SJ -- Samuel Klein http://meta.wikimedia.org/wiki/user:sj
Hi Samuel,
Thanks for the clarification. Good to know that the foundation is in the know.
Ravi and I have acted as interlocutors with the Google team for Tamil Wikipedia. We have exchanged several emails and have had one conference call with the Google team. During these communications, we have conveyed clear bullet-pointed requirements that are the bare minimum necessities to meet our guidelines and are very much doable. Of these, to be fair, they did address some of our issues, but not the most important ones.
The most important of the issues stem from the pillars of Wikipedia and we absolutely can't compromise on that. For Google, the required outcome is the number of words in Indian languages SEOed from their query logs. For the translators, it's the money that they'll get for each word translated. For Wikipedia, the basic necessity is readable and meaningful content added through a process that doesn't subvert the Wiki way.
Following is a summary: 1. The quality is abysmal. Too mechanical and ungrammatical more than 50% of the time. [To set the context for Samuel (who might mistake that it works like it does for European languages), the toolkit is not anywhere ready for Indian languages and doesn't do any translation as such, it's the translators who do that and it's unimaginable that a native speaker writes those words, not sentences.] 2. The process is hands-off, the translators don't even read the page that they've dumped. 3. The pages are broken with infinite erroneous redlinks and missing templates due to an easy-to-fix bug in the kit. 4. The basic premise of the team is 'something's better than nothing'. It's not. Having no article on a subject is better than having an unreadable text of 2000 words on that subject. 5. Their process requirement: you can pick subjects, give guidelines, but we can't guarantee anything. We don't carry any responsibility to improve the articles once dumped and we don't want you to mess with them. Of course, on the last point, they have come down. They agreed to have a look at talk page feedback and only one translator (of nearly 20-30) has responded so far. This is CLEARLY unacceptable and our editors have said it in as many words.
I also request the community here and the foundation folks to reflect on the policy issues: how can we let someone post articles of no acceptable level which they won't edit further? Tomorrow, if a vandal does the same, won't we block them? On top of this, they casually mentioned some sort of agreement or contract with the foundation, but decline to give any information regarding that. Either they don't get what Wikipedia is or they don't care about it.
On a positive note, we still have our channel open with them and we're going to propose that they approach universities or the Classical Tamil Institute in Chennai who undertake such projects employing retired Tamil professors and teachers. Also, carrying an obligation to fix issues before adding new articles. If they can't do that, we don't have any other option left.
- Sundar
"That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted." - George Boole, quoted in Iverson's Turing Award Lecture
----- Original Message ----
From: Samuel Klein meta.sj@gmail.com To: wikimediaindia-l@lists.wikimedia.org Sent: Wed, April 21, 2010 11:41:16 PM Subject: [Wikimediaindia-l] Re: Philosophical view on Google translated articles
Hello,
My first post on this list, and a long one :-) The topic of
better
supporting small language Wikipedias is one that is close to my
heart.
The foundation doesn't have any particular policy on
third-party
translations or article-writing projects. As Achal says,
every
community is welcome to use translation tools or not as they see
fit;
and to work with outside translation groups or not as they see
fit.
Ravi's concerns are valid -- people interested in translation as
a
whole may want to discuss some of these issues on the foundation
and
translation mailing lists -- you will find that there are
many
multilingual editors who are interested in the good (and bad) uses
of
GTT and other translation tools.
== on the use of automatic
translations ==
Automatic translations can be useful as one arrow in the
quiver of a
community of editors. For instance, I find it helpful for
translated
pages to have an automatic category, and a large cleanup template
at
top, something like: "this page was automatically translated by
[TOOL]
from [permalink to revision of article in another
language].
It may need cleanup to meet [[STYLE GUIDE|community
standards]]."
In the case of Google and their Translation Toolkit, I
think it would
be good for Wikipedians to give them strong feedback about how
they
need to improve the tool for it to be more useful to
Wikipedians.
(and, if it is more of a nuisance than a help, the community
should be
clear that it is not helping.)
== On Google's toolkit
and translation work ==
Google has been fairly transparent about what
they are doing, and has
been in touch with the Foundation on a few occasions
to ask for advice
on how to make their tools more useful. I encourage
them to ask the
local communities directly for that advice... (however, they
have had
few direct responses from those language-communities. I
observed this
directly on swahili wikipedia - there were a few general
commnts about
the difficulties raised by GTT overwriting existing articles,
but few
specific feature requests / recommendations / requirements from
the
active swahili editors.)
You can start a page for feature requests
(and feature requirements)
for this sort of translation -- and tell the
Google translators (in
particular) that all translations /must/ adhere to a
certain style or
format, or must be less invasive when an article already
exists on the
topic. (noone will continue a project if they know that
its work is
going to be reverted or removed.)
From: Srikanth Ramakrishnan < href="mailto:rsrikanth05@gmail.com">rsrikanth05@gmail.com>
I agree with Shiju and Ramesh. I tried it out for Hindi. And the phrase 'A fully charged battery' got translated to what would mean a battery that got charged [the court charged]. It isn't all that accurate right now, but it may improve. While to a certain extent, it may seem like Google is catalising Localised content, you can clearly see that Google might be trying to gain Monopoly over Wikipedia as well.
I don't
think they have any interest in gaining monopoly over
Wikipedia. They
are not storing the translated articles, only
publishing them to
Wikipedia. While they are storing the "translation
memory" produced as
a result, they make that available under a free
license, for other
translators or tools to use.
Google has carried out similar projects
in Arabic and Swahili among
other languges; I helped with the recent
Swahili Wikipedia Challenge,
which was supported by GTT (for participants who
wanted to use the
toolkit to translate an article rather than writing one
from scratch)
-- but the resulting articles were rated based on their
usefulness, so
that poorly-translated articles did not rank
highly.
That was a largely community-driven translation effort, with a
contest
run and maintained by Swahili admins.
Cheers, SJ -- Samuel
_______________________________________________ Wikimediaindia-l
mailing list
href="mailto:Wikimediaindia-l@lists.wikimedia.org">Wikimediaindia-l@lists.wikimedia.org
href="https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l" target=_blank
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Hi Achal and Samuel,
Thank you for your mails.
***What is the board's view on paid editing?***
I think the issue is getting diluted / distracted being seen on a translation kit based view.
The issue is not the use of the GTT itself. The issue is who uses, how they use it and why they use it.
May be the WMF was aware of the operation. But the local communities were not. We were scratching our heads for almost 4,5 months before "discovering" that Google was doing this. This is not in anyway transparent.
So far as I have observed the operation is capable of doing severe damage to the community dynamics (Regular Wikipedians Vs Google Translators? Who contributes more? Can paid contribution be compared to volunteer effort? Is translating the only wiki work? Why preferential treatment for Google translators? Should the community's time be wasted in cleaning up paid worker's mess).
If the community is not healthy and united, the project will die soon. These are more serious issues than the quality of articles.
Ravi
Hi Sundar and Ravi,
On Thu, Apr 22, 2010 at 2:00 AM, Ravishankar ravidreams@gmail.com wrote:
Hi Achal and Samuel,
Thank you for your mails.
***What is the board's view on paid editing?***
The foundation doesn't pay people to edit, but some third party groups have staff who contribute to WP as part of their job (say, sharing information about archives or cultural collections), or contractors who help with translation, from time to time. Many more groups fund topic-specific projects which have as one of their outcomes the improvement of wikipedia (from class projects on a topic to academic-group decisions to revise a wikipedia category as part of their review of material in their field to authors or publishers publishing their material in part via wikibooks).
( and the line is not always clear -- the Foundation has accepted grants in the past to support specific topics -- one to support creation of Wikijunior, years ago; and one to set up an award/contest for great illustrations. Does this count as 'paid editing'? we generally take the view that finding ways to recognize great work, encourage new contributors to get involved, reduce barriers to entry, or bring people together for face-to-face meetings, are useful ways to support community growth. )
These sorts of content questions are generally left up to communities to address. There is no official foundation view about whether this is good or not. In my personal opinion, this type of effort has been successful at times and unsuccessful at others in contributing to the world's useful educational material. And again in my personal opinion, contributors should not be blacklisted just because they are contributing while 'at work' -- but they should be expected to follow the same style and conflict-of-interest guidelines as everyone else.
We were scratching our heads for almost 4,5 months before "discovering" that Google was doing this. This is not in anyway transparent.
That is surprising. So for 4.5 months you knew that some people were submitting strange translated articles while ignoring their talk pages, but didn't know why? links to related user pages/contributions would be helpful.
the community dynamics (Regular Wikipedians Vs Google Translators?
I think we set a good standard in the Swahili project for how translation can be useful: as a context that draws new people in to become long-term editors, by lowering the barrier to entry for starting a new article.
Why preferential treatment for Google translators?
There is none that I know of (unless you have a community policy about this).
--------------------------- Earlier, Sundar wrote:
The most important of the issues stem from the pillars of Wikipedia... For Wikipedia, the basic necessity is readable and meaningful content added through a process that doesn't subvert the Wiki way.
Sure. But you can speak to the translators from a position of strength -- if they are not contributing in a positive way, their contributions won't be kept in the project.
- The quality is abysmal. Too mechanical and ungrammatical more than 50% of
the time. The process is hands-off, the translators don't even read the page that they've dumped.
Ok, so they need better translators working on the project. If a page is so ungrammatical as to be no better than a redlink, does it fall under one of your deletion policies?
- The pages are broken with infinite erroneous redlinks and missing templates
due to an easy-to-fix bug in the kit.
You can always instruct them to suspend the project on Tamil until this is fixed. "we won't be able to accept new articles with the following problems. please fix this bug first."
You, Ravi, Mayooranathan, பரிதிமதி, Karthik, Nat, &c are the project admins -- you don't need any special 'permission' to revert the work of an unhelpful editor. But please bear in mind that these /could/ be productive contributors, and it may be worth mentoring them a bit more rather than asking them to leave.
- The basic premise of the team is 'something's better than nothing'. It's not.
I am not a deletionist, but even inclusionists will agree that 'something' can be worse than nothing when it is incomprehensible.
- Their process requirement: you can pick subjects, give guidelines, but we
can't guarantee anything. We don't carry any responsibility to improve the articles once dumped and we don't want you to mess with them.
They are in no position to ask you to 'not mess with' an article; why would this issue come up? They may have no responsibility to improve articles, but you likewise have no responsibility to keep them.
I also request the community here and the foundation folks to reflect on the policy issues: how can we let someone post articles of no acceptable level which they won't edit further? Tomorrow, if a vandal does the same, won't we block them?
Vandalism is blocked because the edits themselves are harmful. People who post unwikified nonsense are rarely blocked, but their work is often reverted or blanked.
On top of this, they casually mentioned some sort of agreement or contract with the foundation, but decline to give any information regarding that. Either they don't get what Wikipedia is or they don't care about it.
That sounds like a 'game of telephone' understanding of the discussions Google has had about how to improve GTT so that it is more useful to Wikipedians, and the successful community collaborations that have happened elsewhere (cf. Swahili Wikipedia). Having met the project manager for GTT, I can say: he *really* does want to make it useful to Wikipedia. That doesn't mean that the people running each subproject care in the same way.
Again: You have no obligation to accept articles that do not meet community standards.
On a positive note, we still have our channel open with them and we're going to propose that they approach universities or the Classical Tamil Institute in Chennai who undertake such projects employing retired Tamil professors and teachers.
This sounds like a great idea.
SJ
Samuel, As a Vandalism reverter on the English Wikipedia, let me inform you that, when a person posts nonsense, EVEN if it is useful, but has a problem, like lacking sources OR, grammatical nonsense. IT still gets reverted, the guy gets a warning, he comes back, revert plus warn again and ultimately, he gets the boot. It happens quite often. I've done it rarely, because the content was disputed, but there are others who do it on a daily basis. Check all automated edits made with Huggle. You'll find plenty of such edits. What Sundar says is more or less the same. My view is that if they refuse to adhere to policies and keep adding errors and DON'T listen to your suggestions, then do ahead with a Warn. If they persist despite a warn. might as well block them.. Regards, Srikanth
On 22 April 2010 20:10, Samuel Klein meta.sj@gmail.com wrote:
Hi Sundar and Ravi,
On Thu, Apr 22, 2010 at 2:00 AM, Ravishankar ravidreams@gmail.com wrote:
Hi Achal and Samuel,
Thank you for your mails.
***What is the board's view on paid editing?***
The foundation doesn't pay people to edit, but some third party groups have staff who contribute to WP as part of their job (say, sharing information about archives or cultural collections), or contractors who help with translation, from time to time. Many more groups fund topic-specific projects which have as one of their outcomes the improvement of wikipedia (from class projects on a topic to academic-group decisions to revise a wikipedia category as part of their review of material in their field to authors or publishers publishing their material in part via wikibooks).
( and the line is not always clear -- the Foundation has accepted grants in the past to support specific topics -- one to support creation of Wikijunior, years ago; and one to set up an award/contest for great illustrations. Does this count as 'paid editing'? we generally take the view that finding ways to recognize great work, encourage new contributors to get involved, reduce barriers to entry, or bring people together for face-to-face meetings, are useful ways to support community growth. )
These sorts of content questions are generally left up to communities to address. There is no official foundation view about whether this is good or not. In my personal opinion, this type of effort has been successful at times and unsuccessful at others in contributing to the world's useful educational material. And again in my personal opinion, contributors should not be blacklisted just because they are contributing while 'at work' -- but they should be expected to follow the same style and conflict-of-interest guidelines as everyone else.
We were scratching our heads for almost 4,5 months before "discovering" that Google was doing this. This is not in anyway transparent.
That is surprising. So for 4.5 months you knew that some people were submitting strange translated articles while ignoring their talk pages, but didn't know why? links to related user pages/contributions would be helpful.
the community dynamics (Regular Wikipedians Vs Google Translators?
I think we set a good standard in the Swahili project for how translation can be useful: as a context that draws new people in to become long-term editors, by lowering the barrier to entry for starting a new article.
Why preferential treatment for Google translators?
There is none that I know of (unless you have a community policy about this).
Earlier, Sundar wrote:
The most important of the issues stem from the pillars of Wikipedia... For Wikipedia, the basic necessity is readable and meaningful content added through a process that doesn't subvert the Wiki way.
Sure. But you can speak to the translators from a position of strength -- if they are not contributing in a positive way, their contributions won't be kept in the project.
- The quality is abysmal. Too mechanical and ungrammatical more than 50%
of
the time. The process is hands-off, the translators don't even read the
page
that they've dumped.
Ok, so they need better translators working on the project. If a page is so ungrammatical as to be no better than a redlink, does it fall under one of your deletion policies?
- The pages are broken with infinite erroneous redlinks and missing
templates
due to an easy-to-fix bug in the kit.
You can always instruct them to suspend the project on Tamil until this is fixed. "we won't be able to accept new articles with the following problems. please fix this bug first."
You, Ravi, Mayooranathan, பரிதிமதி, Karthik, Nat, &c are the project admins -- you don't need any special 'permission' to revert the work of an unhelpful editor. But please bear in mind that these /could/ be productive contributors, and it may be worth mentoring them a bit more rather than asking them to leave.
- The basic premise of the team is 'something's better than nothing'.
It's not.
I am not a deletionist, but even inclusionists will agree that 'something' can be worse than nothing when it is incomprehensible.
- Their process requirement: you can pick subjects, give guidelines, but
we
can't guarantee anything. We don't carry any responsibility to improve
the
articles once dumped and we don't want you to mess with them.
They are in no position to ask you to 'not mess with' an article; why would this issue come up? They may have no responsibility to improve articles, but you likewise have no responsibility to keep them.
I also request the community here and the foundation folks to reflect on
the
policy issues: how can we let someone post articles of no acceptable
level which
they won't edit further? Tomorrow, if a vandal does the same, won't we
block
them?
Vandalism is blocked because the edits themselves are harmful. People who post unwikified nonsense are rarely blocked, but their work is often reverted or blanked.
On top of this, they casually mentioned some sort of agreement or
contract
with the foundation, but decline to give any information regarding that.
Either
they don't get what Wikipedia is or they don't care about it.
That sounds like a 'game of telephone' understanding of the discussions Google has had about how to improve GTT so that it is more useful to Wikipedians, and the successful community collaborations that have happened elsewhere (cf. Swahili Wikipedia). Having met the project manager for GTT, I can say: he *really* does want to make it useful to Wikipedia. That doesn't mean that the people running each subproject care in the same way.
Again: You have no obligation to accept articles that do not meet community standards.
On a positive note, we still have our channel open with them and we're
going to
propose that they approach universities or the Classical Tamil Institute
in
Chennai who undertake such projects employing retired Tamil professors
and
teachers.
This sounds like a great idea.
SJ
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Hello Srikanth,
On Thu, Apr 22, 2010 at 11:19 AM, Srikanth Ramakrishnan rsrikanth05@gmail.com wrote:
Samuel, As a Vandalism reverter on the English Wikipedia, let me inform you that, when a person posts nonsense, EVEN if it is useful, but has a problem, like lacking sources OR, grammatical nonsense. IT still gets reverted, the guy gets a warning, he comes back, revert plus warn again and ultimately, he gets the boot. It happens quite often. I've done it rarely, because the
This is a set of aggressive policies developed over time by a large wiki. In the early days of English WP, when it was the size that Tamil WP is now, this was certainly not the case for well-meaning contributors.
I do not know that the current en:wp policies represent best practices -- the research from the strategy project suggests that they are limiting growth and editor retention -- and in particular smaller wikis may have other needs and priorities.
Of course ta:wp is welcome to adopt any policies that you wish, but I would not advise doing so simply because the English Wikipedia does.
Warmly, SJ
Hi Samuel,
I'm an inclusionist myself and most editors in Tamil Wiki are either inclusionists are mergists. A few have concerns that a random sampler might get a mistaken impression about Tamil Wiki, which is one of the most well-written collaborative websites in Tamil language. As far as these articles are concerned, there's bipartisan support for the view that the quality has to improve.
As a last resort, we've proposed an observation period in yesterday's call with the Google team. We do understand that they mean well for the Wiki, but have a feeling that the constraints imposed by the modus operandi make it not work for us. In yesterday's call, we've conveyed the Tamil Wiki consensus to the Google team and they've agreed to encourage their translators to engage like regular users. We hope that things will improve if that is followed in spirit.
- Sundar
"That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted." - George Boole, quoted in Iverson's Turing Award Lecture
----- Original Message ----
From: Samuel Klein meta.sj@gmail.com To: wikimediaindia-l@lists.wikimedia.org Sent: Fri, April 23, 2010 1:03:24 AM Subject: Re: [Wikimediaindia-l] Philosophical view on Google translated articles
Hello Srikanth,
On Thu, Apr 22, 2010 at 11:19 AM, Srikanth
Ramakrishnan
<
href="mailto:rsrikanth05@gmail.com">rsrikanth05@gmail.com> wrote:
Samuel, As a Vandalism reverter on the English Wikipedia, let me inform you that, when a person posts nonsense, EVEN if it is useful, but has a problem, like lacking sources OR, grammatical nonsense. IT still gets reverted, the guy gets a warning, he comes back, revert plus warn again and ultimately, he gets the boot. It happens quite often. I've done it rarely, because the
This is a set of aggressive policies developed over
time by a large
wiki. In the early days of English WP, when it was the
size that
Tamil WP is now, this was certainly not the case for
well-meaning
contributors.
I do not know that the current en:wp
policies represent best practices
-- the research from the strategy project
suggests that they are
limiting growth and editor retention -- and in
particular smaller
wikis may have other needs and priorities.
Of
course ta:wp is welcome to adopt any policies that you wish, but I
would not
advise doing so simply because the English Wikipedia does.
Warmly, SJ
_______________________________________________ Wikimediaindia-l
mailing list
href="mailto:Wikimediaindia-l@lists.wikimedia.org">Wikimediaindia-l@lists.wikimedia.org
href="https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l" target=_blank
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Forgot to mention something:
// You can start a page for feature requests (and feature requirements) for this sort of translation -- and tell the Google translators (in particular) that all translations /must/ adhere to a certain style or format, or must be less invasive when an article already exists on the topic. (noone will continue a project if they know that its work is going to be reverted or removed.) //
We've done precisely that. We added categories to group their volunteers, added a template to the translated articles, added another category for talkpages where we have given feedback for them so that they can monitor from one page, and one noticeboard for guidelines and requirements. In fact, we were hoping that this would help in setting the process for any such project in the future with other Wikipedias as well. While they made use of some of the feedback, as long as the articles being added afresh don't start meeting the guidelines, it's too much work for the Wikipedians to go after every such article.
Regards, Sundar
"That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted." - George Boole, quoted in Iverson's Turing Award Lecture
----- Original Message ----
From: BalaSundaraRaman sundarbecse@yahoo.com To: wikimediaindia-l@lists.wikimedia.org Sent: Thu, April 22, 2010 11:08:01 AM Subject: Re: [Wikimediaindia-l] Philosophical view on Google translated articles
Hi Samuel,
Thanks for the clarification. Good to know that the foundation
is in the know.
Ravi and I have acted as interlocutors with the Google
team for Tamil Wikipedia. We have exchanged several emails and have had one conference call with the Google team. During these communications, we have conveyed clear bullet-pointed requirements that are the bare minimum necessities to meet our guidelines and are very much doable. Of these, to be fair, they did address some of our issues, but not the most important ones.
The most
important of the issues stem from the pillars of Wikipedia and we absolutely can't compromise on that. For Google, the required outcome is the number of words in Indian languages SEOed from their query logs. For the translators, it's the money that they'll get for each word translated. For Wikipedia, the basic necessity is readable and meaningful content added through a process that doesn't subvert the Wiki way.
Following is a summary: 1. The quality
is abysmal. Too mechanical and ungrammatical more than 50% of the time. [To set the context for Samuel (who might mistake that it works like it does for European languages), the toolkit is not anywhere ready for Indian languages and doesn't do any translation as such, it's the translators who do that and it's unimaginable that a native speaker writes those words, not sentences.]
2. The
process is hands-off, the translators don't even read the page that they've dumped.
3. The pages are broken with infinite erroneous redlinks and missing
templates due to an easy-to-fix bug in the kit.
4. The basic premise of the
team is 'something's better than nothing'. It's not. Having no article on a subject is better than having an unreadable text of 2000 words on that subject.
5. Their process requirement: you can pick subjects, give
guidelines, but we can't guarantee anything. We don't carry any responsibility to improve the articles once dumped and we don't want you to mess with them. Of course, on the last point, they have come down. They agreed to have a look at talk page feedback and only one translator (of nearly 20-30) has responded so far. This is CLEARLY unacceptable and our editors have said it in as many words.
I also request the community here and the foundation folks to
reflect on the policy issues: how can we let someone post articles of no acceptable level which they won't edit further? Tomorrow, if a vandal does the same, won't we block them? On top of this, they casually mentioned some sort of agreement or contract with the foundation, but decline to give any information regarding that. Either they don't get what Wikipedia is or they don't care about it.
On a positive note, we still have our channel open with them and
we're going to propose that they approach universities or the Classical Tamil Institute in Chennai who undertake such projects employing retired Tamil professors and teachers. Also, carrying an obligation to fix issues before adding new articles. If they can't do that, we don't have any other option left.
- Sundar
"That language is an instrument of human reason,
and not merely a medium for the expression of thought, is a truth generally admitted."
- George Boole, quoted in Iverson's Turing Award
Lecture
----- Original Message ----
From: Samuel Klein < href="mailto:meta.sj@gmail.com">meta.sj@gmail.com> To: ymailto="mailto:wikimediaindia-l@lists.wikimedia.org" href="mailto:wikimediaindia-l@lists.wikimedia.org">wikimediaindia-l@lists.wikimedia.org
Sent: Wed, April 21, 2010 11:41:16 PM Subject: [Wikimediaindia-l] Re: Philosophical view on Google translated articles
Hello,
My first post on this list, and a long one :-) The topic of
better
supporting small language Wikipedias is one that is close to
my heart.
The foundation doesn't have any particular policy on
third-party
translations or article-writing projects. As
Achal says, every
community is welcome to use translation tools or
not as they see fit;
and to work with outside translation groups or
not as they see fit.
Ravi's concerns are valid -- people
interested in translation as a
whole may want to discuss some of
these issues on the foundation and
translation mailing lists -- you
will find that there are many
multilingual editors who are
interested in the good (and bad) uses of
GTT and other translation
tools.
== on the use of automatic
translations
Automatic translations can be useful as one arrow in the
quiver of a
community of editors. For instance, I find it helpful for
translated
pages to have an automatic category, and a large cleanup
template at
top, something like: "this page was
automatically translated by [TOOL]
from [permalink to
revision of article in another language].
It may need
cleanup to meet [[STYLE GUIDE|community standards]]."
In the
case of Google and their Translation Toolkit, I think it would
be
good for Wikipedians to give them strong feedback about how
they
need to improve the tool for it to be more useful to
Wikipedians.
(and, if it is more of a nuisance than a help, the community
should be
clear that it is not helping.)
== On Google's
toolkit and translation work ==
Google has been fairly
transparent about what they are doing, and has
been in touch with
the Foundation on a few occasions to ask for advice
on how to make
their tools more useful. I encourage them to ask the
local
communities directly for that advice... (however, they have had
few
direct responses from those language-communities. I observed this
directly on swahili wikipedia - there were a few general
commnts about
the difficulties raised by GTT overwriting existing articles,
but few
specific feature requests / recommendations / requirements
from the
active swahili editors.)
You can start a page for
feature requests (and feature requirements)
for this sort of
translation -- and tell the Google translators (in
particular) that
all translations /must/ adhere to a certain style or
format, or must
be less invasive when an article already exists on the
topic.
(noone will continue a project if they know that its work is
going
to be reverted or removed.)
From: Srikanth Ramakrishnan < href="mailto: href="mailto:rsrikanth05@gmail.com">rsrikanth05@gmail.com"> ymailto="mailto:rsrikanth05@gmail.com" href="mailto:rsrikanth05@gmail.com">rsrikanth05@gmail.com>
I agree with Shiju and Ramesh. I tried it out for Hindi. And the phrase 'A fully charged battery' got translated to what would mean a battery that got charged [the court charged]. It isn't all that accurate right now, but it may improve. While to a certain extent, it may seem like Google is catalising Localised content, you can clearly see that Google might be
trying to gain Monopoly over Wikipedia as well.
I don't
think they have any interest in gaining monopoly over
Wikipedia. They
are not storing the translated articles, only
publishing them to
Wikipedia. While they are storing the "translation
memory"
produced as a result, they make that available under a free
license,
for other translators or tools to use.
Google has carried
out similar projects in Arabic and Swahili among
other
languges; I helped with the recent Swahili Wikipedia Challenge,
which was supported by GTT (for participants who
wanted to use the
toolkit to translate an article rather than writing one
from scratch)
-- but the resulting articles were rated based on their
usefulness, so
that poorly-translated articles did not rank
highly.
That was a largely community-driven translation effort, with a
contest
run and maintained by Swahili admins.
Cheers, SJ -- Samuel
Klein
_______________________________________________ Wikimediaindia-l
mailing list
href="mailto: ymailto="mailto:Wikimediaindia-l@lists.wikimedia.org" href="mailto:Wikimediaindia-l@lists.wikimedia.org">Wikimediaindia-l@lists.wikimedia.org"> ymailto="mailto:Wikimediaindia-l@lists.wikimedia.org" href="mailto:Wikimediaindia-l@lists.wikimedia.org">Wikimediaindia-l@lists.wikimedia.org
href=" target=_blank
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l"
target=_blank
href="https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l" target=_blank
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
_______________________________________________ Wikimediaindia-l
mailing list
href="mailto:Wikimediaindia-l@lists.wikimedia.org">Wikimediaindia-l@lists.wikimedia.org
href="https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l" target=_blank
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
wikimediaindia-l@lists.wikimedia.org