Hi, Yesterday I was asked some tricky questions by a journo. They were something like this :
*" Does the surge of number of Indian language Wikipedia articles has anything to do with paid translations by Google ?"* * * *"Why should regular Wikipedians be motivated to contribute when Google pays to do the same ?"*
I didn't answer them due to the sensitivity of the issue. But I am now curious on a statistical point of view.
I am not referring to the philosophical question on whether paid editing is good or bad.
But is there a way to find how many of the articles created in last year were created by normal editors/bots and Google translators in each of the Indian language Wikipedias? Somebody told me , the translators are supposed to leave the translate URLs of Google in the edit summaries to get paid?
Which are the languages on which Google translations are happening ? I know it happens on Hindi, Tamil, Kannada and Telugu, at least. Any other ? I know it is banned to Bengali & Malayalam.
Regards Tinu Cherian
On Fri, Feb 4, 2011 at 12:31 PM, CherianTinu Abraham tinucherian@gmail.com wrote:
"Why should regular Wikipedians be motivated to contribute when Google pays to do the same ?"
Reminds me, in a tangential way, of http://stormyscorner.com/would-you-do-it-again-for-free
In Tamil wikipedia, Google translated articles number 1,240 (barring one or two which we might have forgotten to tag). This translates to roughly 4.4 % of the total articles in Ta. wiki. Also about 100 or so of the translated articles already had stubs/fair sized articles, before google translators overwrote them.
Creation of new google articles stopped on August 15, 2010 because of quality concerns. After an extensive quality review process we reached a deal with google. In this second phase - we picked the translators and the topics to be translated. They are done in the translator's user space and moved to article space only after review by tamil wikipedians. So far 25 such articles have been translated and are waiting in the userspace for the review to be completed.
Regarding the paid vs non-paid issue, this is one of the major bones of contention in Ta Wiki. Volunteers get frustrated because paid translators turn in shoddy work and repeated attempts to train them went nowhere. Now we have limited the number of new google articles to something we think we can manage. (25 in the past six months). But the quality still isn't something what i would call a "professional translation". I personally have no great hopes for this project. It is a distraction and saps valuable volunteer time and effort which is better spent somewhere else in wikipedia. (personal opinion, not ta wiki consensus)
On Fri, Feb 4, 2011 at 12:38 PM, sankarshan foss.mailinglists@gmail.comwrote:
On Fri, Feb 4, 2011 at 12:31 PM, CherianTinu Abraham tinucherian@gmail.com wrote:
"Why should regular Wikipedians be motivated to contribute when Google
pays
to do the same ?"
Reminds me, in a tangential way, of http://stormyscorner.com/would-you-do-it-again-for-free
-- sankarshan mukhopadhyay http://sankarshan.randomink.org/blog
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Hi Bala,
Thanks for the stats.
On Google:
While we have no new Google articles for the last 5 months, around 600 articles get created every month in Tamil Wikipedia. So concerning Tamil Wikipedia's case, Google has not contributed much article number wise.
But all their articles are very lengthy and hence have contributed significantly to the total number of words in all Indic Wikis they operate. ( Precise stats of number of words can be obtained from Google's presentation in last Wikimania). Nevertheless, Google articles are still under review in Tamil Wikipedia. they may be moved to user namespace in bulk and will be moved back only after meeting quality guidelines.
While Google has contributed in some way in increasing article count, size in the Wikipedias it operates, it is a thing of concern as their sub standard translation can bring down the image of Wikipedias concerned. Also, small Wikipedias which don't have a well knot community won't be able to control Google's operation or improve the quality of the articles.
On Paid translation:
//*Why should regular Wikipedians be motivated to contribute when Google pays to do the same ?"* //
This is a very easy question. Many of the regular Wikipedians are professionals of high calibre and their time is much more worth than what the translators get paid. It is not money that motivates them but the spirit of contributing / pleasure of doing something they like. So they will continue to contribute. Especially, when paid translators do sub standard work, they think better to stop it and just do it themselves.
Ravi
On Fri, Feb 4, 2011 at 12:53 PM, Bala Jeyaraman sodabottle@gmail.comwrote:
In Tamil wikipedia, Google translated articles number 1,240 (barring one or two which we might have forgotten to tag). This translates to roughly 4.4 % of the total articles in Ta. wiki. Also about 100 or so of the translated articles already had stubs/fair sized articles, before google translators overwrote them.
Creation of new google articles stopped on August 15, 2010 because of quality concerns. After an extensive quality review process we reached a deal with google. In this second phase - we picked the translators and the topics to be translated. They are done in the translator's user space and moved to article space only after review by tamil wikipedians. So far 25 such articles have been translated and are waiting in the userspace for the review to be completed.
HI Ravi / Bala, It is great work that Tamil Wiki is doing by not allowing the dump of paid translators to article space directly, thereby maintaining the overall quality of the Wikipedia. But there are many other Indian language Wikipedias where the translators move the article directly to mainspace, especially on which the communities are weak in strength.
Would like to hear inputs from other languages on this, please share with us. Especially on the number of articles created by Google translators Vs organic creation ( last one year is fine) and also thoughts on this project. There might be projects which are not even aware that something is happening on their Wikipedia.
Regards Tinu Cherian
N.B. Regarding what motivates us to contribute to Wikipedia for free is the reason for you and me being here :)
On Fri, Feb 4, 2011 at 1:31 PM, Ravishankar ravidreams@gmail.com wrote:
Hi Bala,
Thanks for the stats.
On Google:
While we have no new Google articles for the last 5 months, around 600 articles get created every month in Tamil Wikipedia. So concerning Tamil Wikipedia's case, Google has not contributed much article number wise.
But all their articles are very lengthy and hence have contributed significantly to the total number of words in all Indic Wikis they operate. ( Precise stats of number of words can be obtained from Google's presentation in last Wikimania). Nevertheless, Google articles are still under review in Tamil Wikipedia. they may be moved to user namespace in bulk and will be moved back only after meeting quality guidelines.
While Google has contributed in some way in increasing article count, size in the Wikipedias it operates, it is a thing of concern as their sub standard translation can bring down the image of Wikipedias concerned. Also, small Wikipedias which don't have a well knot community won't be able to control Google's operation or improve the quality of the articles.
On Paid translation:
//*Why should regular Wikipedians be motivated to contribute when Google pays to do the same ?"* //
This is a very easy question. Many of the regular Wikipedians are professionals of high calibre and their time is much more worth than what the translators get paid. It is not money that motivates them but the spirit of contributing / pleasure of doing something they like. So they will continue to contribute. Especially, when paid translators do sub standard work, they think better to stop it and just do it themselves.
Ravi
On Fri, Feb 4, 2011 at 12:53 PM, Bala Jeyaraman sodabottle@gmail.comwrote:
In Tamil wikipedia, Google translated articles number 1,240 (barring one or two which we might have forgotten to tag). This translates to roughly 4.4 % of the total articles in Ta. wiki. Also about 100 or so of the translated articles already had stubs/fair sized articles, before google translators overwrote them.
Creation of new google articles stopped on August 15, 2010 because of quality concerns. After an extensive quality review process we reached a deal with google. In this second phase - we picked the translators and the topics to be translated. They are done in the translator's user space and moved to article space only after review by tamil wikipedians. So far 25 such articles have been translated and are waiting in the userspace for the review to be completed.
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
At bottom :-
On Fri, Feb 4, 2011 at 14:55, CherianTinu Abraham tinucherian@gmail.com wrote:
HI Ravi / Bala, It is great work that Tamil Wiki is doing by not allowing the dump of paid translators to article space directly, thereby maintaining the overall quality of the Wikipedia. But there are many other Indian language Wikipedias where the translators move the article directly to mainspace, especially on which the communities are weak in strength. Would like to hear inputs from other languages on this, please share with us. Especially on the number of articles created by Google translators Vs organic creation ( last one year is fine) and also thoughts on this project. There might be projects which are not even aware that something is happening on their Wikipedia. Regards Tinu Cherian N.B. Regarding what motivates us to contribute to Wikipedia for free is the reason for you and me being here :)
Hi all, It was interesting to read about the whole paid thing as well as google's paying people to translate. I do not think its a bad idea, People turning in shoddy work is a cause of concern of course.
About translated articles moving to mainline or otherwise, a basic question how does one know whether an article is translated or written afresh/from ground fresh. The only thing would be to check the Wiki History or are there any signs which tells that this say 'X article is translated' or something to that effect.
I do know any other language except for Hindi and Marathi and more fluent in Hindi. So can somebody post a link or two of translated hindi article/s so one can see how to note /notice if an article is translated . would be interested to also what 'google bots' have to write besides the article in order to get paid.
Looking forward for info.
fluent in Hindi. So can somebody post a link or two of translated hindi article/s so one can see how to note /notice if an article is translated .
While the translated article is posted, there will be a edit summary that mentions the articles is translated using Google Translation Kit. This can be seen in recent changes and history. In Tamil Wikipedia, once a user starts uploading such articles, we mark his user talk page and categorise him under Google translators so we can review his edits. The articles will also be categorized under Google translation project and have an info box at the top so casual users can know it is a Google translated article. For Wikipedias that don't monitor Google project actively, recent changes / history is the only way to know.
would be interested to also what 'google bots' have to write besides the article in order to get paid.
In Tamil Wikipedia, earlier they wrote that the article is translated from particular version of English Wiki article. Later they stopped it. So, they don't write any thing now. The translated article upload process is a well coordinated operation between translators and Google and I believe they have internal tools to review this. So, they don't need to write anything to prove they translated.
We are also in a process of writing a complete review of the project. It will have complete details, suggestions and best practices for such a project. Will post it in a month's time.
Thanks, Ravi
Looking forward for info.
Regards, Shirish Agarwal शिरीष अग्रवाल
My quotes in this email licensed under CC 3.0 http://creativecommons.org/licenses/by-nc/3.0/ http://flossexperiences.wordpress.com 065C 6D79 A68C E7EA 52B3 8D70 950D 53FB 729A 8B17
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Just an example of Google translated article on "Internet Protocol" in Kannada Wikipedia.
http://kn.wikipedia.org/wiki/%E0%B2%85%E0%B2%82%E0%B2%A4%E0%B2%B0%E0%B2%9C%E...)
Check its history http://kn.wikipedia.org/w/index.php?title=%E0%B2%85%E0%B2%82%E0%B2%A4%E0%B2%...
See the edit summary " (Translated from http://en.wikipedia.org/wiki/Internet_Protocol (revision: 408211110) using http://translate.google.com/toolkit with about 97% human translations.) " in the History of article.
They are not Automated Google bots. These are done by (very :) ) human translators paid by Google.
I am working on a tool with Phlip Tiju ( User:Indianmagicians ) to find stats on organically created articles , bot articles & Google translations. Will update soon.
Regards Tinu Cherian
2011/2/5 shirish शिरीष shirishag75@gmail.com
At bottom :-
On Fri, Feb 4, 2011 at 14:55, CherianTinu Abraham tinucherian@gmail.com wrote:
HI Ravi / Bala, It is great work that Tamil Wiki is doing by not allowing the
dump
of paid translators to article space directly, thereby maintaining the overall quality of the Wikipedia. But there are many other Indian
language
Wikipedias where the translators move the article directly to mainspace, especially on which the communities are weak in strength. Would like to hear inputs from other languages on this, please share with us. Especially on the number of articles created by Google translators Vs organic creation ( last one year is fine) and also
thoughts
on this project. There might be projects which are not even aware that something is happening on their Wikipedia. Regards Tinu Cherian N.B. Regarding what motivates us to contribute to Wikipedia for free is the reason for you and me being here :)
Hi all, It was interesting to read about the whole paid thing as well as google's paying people to translate. I do not think its a bad idea, People turning in shoddy work is a cause of concern of course.
About translated articles moving to mainline or otherwise, a basic question how does one know whether an article is translated or written afresh/from ground fresh. The only thing would be to check the Wiki History or are there any signs which tells that this say 'X article is translated' or something to that effect.
I do know any other language except for Hindi and Marathi and more fluent in Hindi. So can somebody post a link or two of translated hindi article/s so one can see how to note /notice if an article is translated . would be interested to also what 'google bots' have to write besides the article in order to get paid.
Looking forward for info.
Regards, Shirish Agarwal शिरीष अग्रवाल
My quotes in this email licensed under CC 3.0 http://creativecommons.org/licenses/by-nc/3.0/ http://flossexperiences.wordpress.com 065C 6D79 A68C E7EA 52B3 8D70 950D 53FB 729A 8B17
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
At bottom :-
On Sat, Feb 5, 2011 at 18:04, CherianTinu Abraham tinucherian@gmail.com wrote:
Just an example of Google translated article on "Internet Protocol" in Kannada Wikipedia. http://kn.wikipedia.org/wiki/%E0%B2%85%E0%B2%82%E0%B2%A4%E0%B2%B0%E0%B2%9C%E...) Check its history http://kn.wikipedia.org/w/index.php?title=%E0%B2%85%E0%B2%82%E0%B2%A4%E0%B2%... See the edit summary " (Translated from http://en.wikipedia.org/wiki/Internet_Protocol (revision: 408211110) using http://translate.google.com/toolkit with about 97% human translations.) " in the History of article. They are not Automated Google bots. These are done by (very :) ) human translators paid by Google. I am working on a tool with Phlip Tiju ( User:Indianmagicians ) to find stats on organically created articles , bot articles & Google translations. Will update soon. Regards Tinu Cherian
Hi all, Thank you Tinu Cherian. The reason why I asked from hindi is :-
a. I know the language if nothing else can atleast guage how good or bad the translation is to the English one.
b. @ Ravi :- Also would have an idea which sort of articles are preferred, if any or if any subjects are preferred over others say 'computer science' over say 'humanities' or 'arts'. Of course this is my own prejudice/bias as this might be but it would be interesting to find/know about.
Also a slightly OT query as well. Has anybody tried any of the transliteration tools, any good FOSS tools some people have tried and can recommend ?
Hi all, This is the link for all google translatedhttp://hi.wikipedia.org/w/index.php?title=%E0%A4%B5%E0%A4%BF%E0%A4%B6%E0%A5%87%E0%A4%B7:AbuseLog&wpSearchFilter=63 articles in hindi wikipedia in last 5 months.
Thank you and regards Mayur
2011/2/6 shirish शिरीष shirishag75@gmail.com
At bottom :-
On Sat, Feb 5, 2011 at 18:04, CherianTinu Abraham tinucherian@gmail.com wrote:
Just an example of Google translated article on "Internet Protocol" in Kannada Wikipedia.
http://kn.wikipedia.org/wiki/%E0%B2%85%E0%B2%82%E0%B2%A4%E0%B2%B0%E0%B2%9C%E...)
Check its history
http://kn.wikipedia.org/w/index.php?title=%E0%B2%85%E0%B2%82%E0%B2%A4%E0%B2%...
See the edit summary " (Translated from http://en.wikipedia.org/wiki/Internet_Protocol (revision: 408211110)
using
http://translate.google.com/toolkit with about 97% human translations.)
" in
the History of article. They are not Automated Google bots. These are done by (very :) ) human translators paid by Google. I am working on a tool with Phlip Tiju ( User:Indianmagicians ) to find stats on organically created articles , bot articles &
Google translations.
Will update soon. Regards Tinu Cherian
Hi all, Thank you Tinu Cherian. The reason why I asked from hindi is :-
a. I know the language if nothing else can atleast guage how good or bad the translation is to the English one.
b. @ Ravi :- Also would have an idea which sort of articles are preferred, if any or if any subjects are preferred over others say 'computer science' over say 'humanities' or 'arts'. Of course this is my own prejudice/bias as this might be but it would be interesting to find/know about.
Also a slightly OT query as well. Has anybody tried any of the transliteration tools, any good FOSS tools some people have tried and can recommend ?
-- Regards, Shirish Agarwal शिरीष अग्रवाल My quotes in this email licensed under CC 3.0 http://creativecommons.org/licenses/by-nc/3.0/ http://flossexperiences.wordpress.com 065C 6D79 A68C E7EA 52B3 8D70 950D 53FB 729A 8B17
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
In Bengali Wikipedia is not like ban, but we thoroughly watch/check the quality of translated content. If translation quality is not standard Bengali, we not accept. Most of the times we faced junk translation from Google translator. If any one translate English Wikipedia content to Bengali Wikipedia with proper Bengali content we would accept.
On Fri, Feb 4, 2011 at 12:31 PM, CherianTinu Abraham tinucherian@gmail.comwrote:
Hi, Yesterday I was asked some tricky questions by a journo. They were something like this :
*" Does the surge of number of Indian language Wikipedia articles has anything to do with paid translations by Google ?"*
*"Why should regular Wikipedians be motivated to contribute when Google pays to do the same ?"*
I didn't answer them due to the sensitivity of the issue. But I am now curious on a statistical point of view.
I am not referring to the philosophical question on whether paid editing is good or bad.
But is there a way to find how many of the articles created in last year were created by normal editors/bots and Google translators in each of the Indian language Wikipedias? Somebody told me , the translators are supposed to leave the translate URLs of Google in the edit summaries to get paid?
Which are the languages on which Google translations are happening ? I know it happens on Hindi, Tamil, Kannada and Telugu, at least. Any other ? I know it is banned to Bengali & Malayalam.
Regards Tinu Cherian
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
With Warm Regards, *Jayanta Nath* Calcutta,West Bengal Facebook :http://www.facebook.com/jayantanth Wikipedia :http://en.wikipedia.org/wiki/User:Jayantanth আসুন পাইরেসি মুক্ত ভারত গড়ি,সবাই মুক্ত সফ্টওয়ার ব্যবহার করি [image: O:-)],অন্যকে ব্যবহারে উৎসাহিত করি। ______________________________
Wikimediaindia-l mailing list wikimedia-in-wb@lists.wikimedia.org Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-in-wbhttps://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
wikimediaindia-l@lists.wikimedia.org