[Foundation-l] Is Google translation is good for Wikipedias?

List overview All Threads
Download

newer

older

[Foundation-l] Global banners...

[Foundation-l] [Language...

Shiju Alex

25 Jul 2010 25 Jul '10

6:12 a.m.

Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active community in a wikipedia how can we expect response from communities? If there is no response from a community, does that mean that Google can hire some native speakers and use machine translation to create articles for that wikipedia?

Now let us go back to a basic question. Does WMF require a wiki community to create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the language*.That might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

- will affect the biological growth of a Wikipedia article - will create copy of English wikipedia article in local wikis - it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working example of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will affect the community growth.If there is no active wiki community, how can we expect them to look after all these junk articles uploaded to wiki every day. When all the important article links are already turned blue, how we can expect any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively. But only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin scripts*) to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the points raised by a Swahali Wikipedianhttp://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-the-swahili-wikipedia/. Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex

Show replies by date

Jon Davis

25 Jul 25 Jul

6:52 a.m.

I think the answer is "Yes and No". As with any new project/concept/idea/trial there are pro's and there are con's. The real question is: Do the pro's outweigh the con's?

...

From just reading what you linked (And not in any way being involved with

these language projects) and my own personal experiences of how I work on Wikipedia. Yes, I think it is a good thing overall.

...

From what I've seen, it is much easier to convince someone who has never

edited, to fix grammatical, spelling or other "simple" mistakes. Generally people don't dive in and write/translate entire articles - it is simply too high of a barrier to entry. These pre-translated articles give people an "in", they are already there, and have obvious errors that are easy to fix.

More "ok" content is better than no content, at least if I have my druthers.

-Jon

On Sat, Jul 24, 2010 at 23:12, Shiju Alex shijualexonline@gmail.com wrote:

...

Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active community in a wikipedia how can we expect response from communities? If there is no response from a community, does that mean that Google can hire some native speakers and use machine translation to create articles for that wikipedia?

Now let us go back to a basic question. Does WMF require a wiki community to create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the language*.That might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

will affect the biological growth of a Wikipedia article

will create copy of English wikipedia article in local wikis

it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working example of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will affect the community growth.If there is no active wiki community, how can we expect them to look after all these junk articles uploaded to wiki every day. When all the important article links are already turned blue, how we can expect any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively. But only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin scripts*) to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the points raised by a Swahali Wikipedian< http://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-th...

...
.

Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Jon [[User:ShakataGaNai]] / KJ6FNQ http://snowulf.com/ http://ipv6wiki.net/

Aphaia

8:53 a.m.

Hi,

On Sun, Jul 25, 2010 at 3:52 PM, Jon Davis wiki@konsoletek.com wrote:

...

I think the answer is "Yes and No". As with any new project/concept/idea/trial there are pro's and there are con's. The real question is: Do the pro's outweigh the con's?

From just reading what you linked (And not in any way being involved with these language projects) and my own personal experiences of how I work on Wikipedia. Yes, I think it is a good thing overall.

From what I've seen, it is much easier to convince someone who has never edited, to fix grammatical, spelling or other "simple" mistakes. Generally people don't dive in and write/translate entire articles - it is simply too high of a barrier to entry. These pre-translated articles give people an "in", they are already there, and have obvious errors that are easy to fix.

In my experience at Transcom and my own as translator, people appreciate pre-translated articles only in a good quality, there are pre-translations in too bad quality which contains too many obvious errors not easy to fix in time frame.

I've seen several requests, both on meta and on language projects, to delete this kind of bad quality "translation" which people think better to scratch a new version.

And in my observation Google translation is still in this level in many languages. And even if you handle Western languages, unless one of them in English, results may be in poor quality (e.g. they cannot keep the distinction between tu/vous, du/Sie etc.)

Cheers,

...

More "ok" content is better than no content, at least if I have my druthers.

-Jon

On Sat, Jul 24, 2010 at 23:12, Shiju Alex shijualexonline@gmail.com wrote:

...
Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active community in a wikipedia how can we expect response from communities? If there is no response from a community, does that mean that Google can hire some native speakers and use machine translation to create articles for that wikipedia?

Now let us go back to a basic question. Does WMF require a wiki community to create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the language*.That might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

- will affect the biological growth of a Wikipedia article - will create copy of English wikipedia article in local wikis - it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working example of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will affect the community growth.If there is no active wiki community, how can we expect them to look after all these junk articles uploaded to wiki every day. When all the important article links are already turned blue, how we can expect any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively. But only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin scripts*) to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the points raised by a Swahali Wikipedian< http://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-th...

...
.

Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Jon [[User:ShakataGaNai]] / KJ6FNQ http://snowulf.com/ http://ipv6wiki.net/ _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- KIZU Naoko http://d.hatena.ne.jp/Britty (in Japanese) Quote of the Day (English): http://en.wikiquote.org/wiki/WQ:QOTD

Mark Williamson

10:18 a.m.

Aphaia, a great deal of confusion has been created with regards to this project. I hope you'll allow me to attempt to clear it up.

These are NOT articles that were translated directly by Google Translate. Rather, they were created using Google Translator Toolkit, which requires human intervention by a speaker of the language - someone to check and correct every single sentence translated, in the case of languages where Google already has machine translation, or to write entirely new _human_ translations, in the cases where no Google Translate module exists (for example, Tamil), with the aid of Translation Memory software.

I currently work as a translator and have found that Google Translator Toolkit is great for speeding up and improving the consistency of translations, and at least the results of my work are usually better with it than they would be without (I'm glad for the consistency - if I'm translating a large document, I'd like to make sure to translate the same phrases the same way every time they occur rather than using slightly different wording the second time around). Since they're revised and corrected by a human, they _should_ have the same level of grammatical correctness, comprehensibility and translation quality as a pure human translation. If they don't, this is the fault of the person using the toolkit, not the software itself.

-m.

On Sun, Jul 25, 2010 at 1:53 AM, Aphaia aphaia@gmail.com wrote:

...

Hi,

On Sun, Jul 25, 2010 at 3:52 PM, Jon Davis wiki@konsoletek.com wrote:

...
I think the answer is "Yes and No". As with any new project/concept/idea/trial there are pro's and there are con's. The real question is: Do the pro's outweigh the con's?

From just reading what you linked (And not in any way being involved with these language projects) and my own personal experiences of how I work on Wikipedia. Yes, I think it is a good thing overall.

From what I've seen, it is much easier to convince someone who has never edited, to fix grammatical, spelling or other "simple" mistakes. Generally people don't dive in and write/translate entire articles - it is simply too high of a barrier to entry. These pre-translated articles give people an "in", they are already there, and have obvious errors that are easy to fix.

In my experience at Transcom and my own as translator, people appreciate pre-translated articles only in a good quality, there are pre-translations in too bad quality which contains too many obvious errors not easy to fix in time frame.

I've seen several requests, both on meta and on language projects, to delete this kind of bad quality "translation" which people think better to scratch a new version.

And in my observation Google translation is still in this level in many languages. And even if you handle Western languages, unless one of them in English, results may be in poor quality (e.g. they cannot keep the distinction between tu/vous, du/Sie etc.)

Cheers,

...
More "ok" content is better than no content, at least if I have my druthers.

-Jon

On Sat, Jul 24, 2010 at 23:12, Shiju Alex shijualexonline@gmail.com wrote:

...
Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active community in a wikipedia how can we expect response from communities? If there is no response from a community, does that mean that Google can hire some native speakers and use machine translation to create articles for that wikipedia?

Now let us go back to a basic question. Does WMF require a wiki community to create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the language*.That might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

- will affect the biological growth of a Wikipedia article - will create copy of English wikipedia article in local wikis - it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working example of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will affect the community growth.If there is no active wiki community, how can we expect them to look after all these junk articles uploaded to wiki every day. When all the important article links are already turned blue, how we can expect any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively. But only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin scripts*) to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the points raised by a Swahali Wikipedian< http://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-th...

...
.

Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Jon [[User:ShakataGaNai]] / KJ6FNQ http://snowulf.com/ http://ipv6wiki.net/ _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- KIZU Naoko http://d.hatena.ne.jp/Britty (in Japanese) Quote of the Day (English): http://en.wikiquote.org/wiki/WQ:QOTD

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Aphaia

10:47 a.m.

Thanks for your clarification, Node.ue, I know it because I attended their presentation on Wikimania. It is an ambitious project I'd like to see it growing, but at this moment they seem to have a serious problem in its system. They seem to use English as a stem language, and assumes all translations are first done into English and then to another language. On the other hand, at least on major non-English Western language Wikipedia some amount of translations (1/3 IIRC) are not related to English.

If you think it works for you, it's fine, but please be aware it might not work for non-English speakers as well as for you.

Cheers,

On Sun, Jul 25, 2010 at 7:18 PM, Mark Williamson node.ue@gmail.com wrote:

...

Aphaia, a great deal of confusion has been created with regards to this project. I hope you'll allow me to attempt to clear it up.

These are NOT articles that were translated directly by Google Translate. Rather, they were created using Google Translator Toolkit, which requires human intervention by a speaker of the language - someone to check and correct every single sentence translated, in the case of languages where Google already has machine translation, or to write entirely new _human_ translations, in the cases where no Google Translate module exists (for example, Tamil), with the aid of Translation Memory software.

I currently work as a translator and have found that Google Translator Toolkit is great for speeding up and improving the consistency of translations, and at least the results of my work are usually better with it than they would be without (I'm glad for the consistency - if I'm translating a large document, I'd like to make sure to translate the same phrases the same way every time they occur rather than using slightly different wording the second time around). Since they're revised and corrected by a human, they _should_ have the same level of grammatical correctness, comprehensibility and translation quality as a pure human translation. If they don't, this is the fault of the person using the toolkit, not the software itself.

-m.

On Sun, Jul 25, 2010 at 1:53 AM, Aphaia aphaia@gmail.com wrote:

...
Hi,

On Sun, Jul 25, 2010 at 3:52 PM, Jon Davis wiki@konsoletek.com wrote:

...
I think the answer is "Yes and No". As with any new project/concept/idea/trial there are pro's and there are con's. The real question is: Do the pro's outweigh the con's?

From just reading what you linked (And not in any way being involved with these language projects) and my own personal experiences of how I work on Wikipedia. Yes, I think it is a good thing overall.

From what I've seen, it is much easier to convince someone who has never edited, to fix grammatical, spelling or other "simple" mistakes. Generally people don't dive in and write/translate entire articles - it is simply too high of a barrier to entry. These pre-translated articles give people an "in", they are already there, and have obvious errors that are easy to fix.

In my experience at Transcom and my own as translator, people appreciate pre-translated articles only in a good quality, there are pre-translations in too bad quality which contains too many obvious errors not easy to fix in time frame.

I've seen several requests, both on meta and on language projects, to delete this kind of bad quality "translation" which people think better to scratch a new version.

And in my observation Google translation is still in this level in many languages. And even if you handle Western languages, unless one of them in English, results may be in poor quality (e.g. they cannot keep the distinction between tu/vous, du/Sie etc.)

Cheers,

...
More "ok" content is better than no content, at least if I have my druthers.

-Jon

On Sat, Jul 24, 2010 at 23:12, Shiju Alex shijualexonline@gmail.com wrote:

...
Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active community in a wikipedia how can we expect response from communities? If there is no response from a community, does that mean that Google can hire some native speakers and use machine translation to create articles for that wikipedia?

Now let us go back to a basic question. Does WMF require a wiki community to create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the language*.That might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

- will affect the biological growth of a Wikipedia article - will create copy of English wikipedia article in local wikis - it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working example of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will affect the community growth.If there is no active wiki community, how can we expect them to look after all these junk articles uploaded to wiki every day. When all the important article links are already turned blue, how we can expect any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively. But only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin scripts*) to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the points raised by a Swahali Wikipedian< http://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-th...

...
.

Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Jon [[User:ShakataGaNai]] / KJ6FNQ http://snowulf.com/ http://ipv6wiki.net/ _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- KIZU Naoko http://d.hatena.ne.jp/Britty (in Japanese) Quote of the Day (English): http://en.wikiquote.org/wiki/WQ:QOTD

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- KIZU Naoko http://d.hatena.ne.jp/Britty (in Japanese) Quote of the Day (English): http://en.wikiquote.org/wiki/WQ:QOTD

Mark Williamson

11:16 a.m.

Aphaia, any machine translation system that produces even remotely comprehensible results should be able to be used in machine-aided translation. It is reduced to low utility if the output is complete gibberish, however this doesn't seem to be the case; regardless, it's possible to turn off automatic translation and the system can be used merely as a translation memory system, which would be useful in case the automatic translation actually did produce gibberish. Still useful, I think, because it automatically breaks text into segments and is at least *intended* to preserve formatting (this seems to be an issue for WP articles) without requiring users to re-type every single wikilink.

-m.

On Sun, Jul 25, 2010 at 3:47 AM, Aphaia aphaia@gmail.com wrote:

...

Thanks for your clarification, Node.ue, I know it because I attended their presentation on Wikimania. It is an ambitious project I'd like to see it growing, but at this moment they seem to have a serious problem in its system. They seem to use English as a stem language, and assumes all translations are first done into English and then to another language. On the other hand, at least on major non-English Western language Wikipedia some amount of translations (1/3 IIRC) are not related to English.

If you think it works for you, it's fine, but please be aware it might not work for non-English speakers as well as for you.

Cheers,

On Sun, Jul 25, 2010 at 7:18 PM, Mark Williamson node.ue@gmail.com wrote:

...
Aphaia, a great deal of confusion has been created with regards to this project. I hope you'll allow me to attempt to clear it up.

These are NOT articles that were translated directly by Google Translate. Rather, they were created using Google Translator Toolkit, which requires human intervention by a speaker of the language - someone to check and correct every single sentence translated, in the case of languages where Google already has machine translation, or to write entirely new _human_ translations, in the cases where no Google Translate module exists (for example, Tamil), with the aid of Translation Memory software.

I currently work as a translator and have found that Google Translator Toolkit is great for speeding up and improving the consistency of translations, and at least the results of my work are usually better with it than they would be without (I'm glad for the consistency - if I'm translating a large document, I'd like to make sure to translate the same phrases the same way every time they occur rather than using slightly different wording the second time around). Since they're revised and corrected by a human, they _should_ have the same level of grammatical correctness, comprehensibility and translation quality as a pure human translation. If they don't, this is the fault of the person using the toolkit, not the software itself.

-m.

On Sun, Jul 25, 2010 at 1:53 AM, Aphaia aphaia@gmail.com wrote:

...
Hi,

On Sun, Jul 25, 2010 at 3:52 PM, Jon Davis wiki@konsoletek.com wrote:

...
I think the answer is "Yes and No". As with any new project/concept/idea/trial there are pro's and there are con's. The real question is: Do the pro's outweigh the con's?

From just reading what you linked (And not in any way being involved with these language projects) and my own personal experiences of how I work on Wikipedia. Yes, I think it is a good thing overall.

From what I've seen, it is much easier to convince someone who has never edited, to fix grammatical, spelling or other "simple" mistakes. Generally people don't dive in and write/translate entire articles - it is simply too high of a barrier to entry. These pre-translated articles give people an "in", they are already there, and have obvious errors that are easy to fix.

In my experience at Transcom and my own as translator, people appreciate pre-translated articles only in a good quality, there are pre-translations in too bad quality which contains too many obvious errors not easy to fix in time frame.

I've seen several requests, both on meta and on language projects, to delete this kind of bad quality "translation" which people think better to scratch a new version.

And in my observation Google translation is still in this level in many languages. And even if you handle Western languages, unless one of them in English, results may be in poor quality (e.g. they cannot keep the distinction between tu/vous, du/Sie etc.)

Cheers,

...
More "ok" content is better than no content, at least if I have my druthers.

-Jon

On Sat, Jul 24, 2010 at 23:12, Shiju Alex shijualexonline@gmail.com wrote:

...
Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active community in a wikipedia how can we expect response from communities? If there is no response from a community, does that mean that Google can hire some native speakers and use machine translation to create articles for that wikipedia?

Now let us go back to a basic question. Does WMF require a wiki community to create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the language*.That might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

- will affect the biological growth of a Wikipedia article - will create copy of English wikipedia article in local wikis - it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working example of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will affect the community growth.If there is no active wiki community, how can we expect them to look after all these junk articles uploaded to wiki every day. When all the important article links are already turned blue, how we can expect any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively. But only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin scripts*) to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the points raised by a Swahali Wikipedian< http://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-th...

...
.

Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Jon [[User:ShakataGaNai]] / KJ6FNQ http://snowulf.com/ http://ipv6wiki.net/ _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- KIZU Naoko http://d.hatena.ne.jp/Britty (in Japanese) Quote of the Day (English): http://en.wikiquote.org/wiki/WQ:QOTD

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- KIZU Naoko http://d.hatena.ne.jp/Britty (in Japanese) Quote of the Day (English): http://en.wikiquote.org/wiki/WQ:QOTD

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Przykuta

10:49 a.m.

New subject: [Foundation-l] Is Google translation is good for Wikipedias?

...

I've seen several requests, both on meta and on language projects, to delete this kind of bad quality "translation" which people think better to scratch a new version.

Uhm. In pl wiki google translate is evil. Translations by google translate are deleted (not speedy). Users who use google translate for mass production of articles are blocked.

So, it's generaly problem with copy (articles, ideas etc.) from en wiki (most popular):

http://pl.wikipedia.org/wiki/Wikipedia:Enwikizm

"Not all things in en wiki are good. Just don't copy thoughtlessly."

przykuta

Mark Williamson

11:17 a.m.

Can we clarify here, are we talking about Google Translate or Google Translator Toolkit?

-m.

On Sun, Jul 25, 2010 at 3:49 AM, Przykuta przykuta@o2.pl wrote:

...

...
I've seen several requests, both on meta and on language projects, to delete this kind of bad quality "translation" which people think better to scratch a new version.

Uhm. In pl wiki google translate is evil. Translations by google translate are deleted (not speedy). Users who use google translate for mass production of articles are blocked.

So, it's generaly problem with copy (articles, ideas etc.) from en wiki (most popular):

http://pl.wikipedia.org/wiki/Wikipedia:Enwikizm

"Not all things in en wiki are good. Just don't copy thoughtlessly."

przykuta

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Przykuta

11:26 a.m.

New subject: [Foundation-l] Is Google translation is good for Wikipedias?

about google translation, I think.

przykuta

...

Can we clarify here, are we talking about Google Translate or Google Translator Toolkit?

-m.

On Sun, Jul 25, 2010 at 3:49 AM, Przykuta przykuta@o2.pl wrote:

...
...
I've seen several requests, both on meta and on language projects, to delete this kind of bad quality "translation" which people think better to scratch a new version.

Uhm. In pl wiki google translate is evil. Translations by google translate are deleted (not speedy). Users who use google translate for mass production of articles are blocked.

So, it's generaly problem with copy (articles, ideas etc.) from en wiki (most popular):

http://pl.wikipedia.org/wiki/Wikipedia:Enwikizm

"Not all things in en wiki are good. Just don't copy thoughtlessly."

przykuta

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Przykuta

11:30 a.m.

New subject: [Foundation-l] Is Google translation is good for Wikipedias?

...

about google translation, I think.

przykuta

oops, sorry i found an e-mail from Shiju Alex in spambox.

Mark Williamson

11:33 a.m.

Well - this seems a bit confusing. I think Shiju Alex was talking about the toolkit, but I got the impression you're referring to Google Translate, which I agree is always unsuitable to produce usable articles.

-m.

On Sun, Jul 25, 2010 at 4:26 AM, Przykuta przykuta@o2.pl wrote:

...

about google translation, I think.

przykuta

...
Can we clarify here, are we talking about Google Translate or Google Translator Toolkit?

-m.

On Sun, Jul 25, 2010 at 3:49 AM, Przykuta przykuta@o2.pl wrote:

...
...
I've seen several requests, both on meta and on language projects, to delete this kind of bad quality "translation" which people think better to scratch a new version.

Uhm. In pl wiki google translate is evil. Translations by google translate are deleted (not speedy). Users who use google translate for mass production of articles are blocked.

So, it's generaly problem with copy (articles, ideas etc.) from en wiki (most popular):

http://pl.wikipedia.org/wiki/Wikipedia:Enwikizm

"Not all things in en wiki are good. Just don't copy thoughtlessly."

przykuta

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Fajro

2:04 p.m.

On Sun, Jul 25, 2010 at 8:33 AM, Mark Williamson node.ue@gmail.com wrote:

...

about the toolkit, but I got the impression you're referring to Google Translate, which I agree is always unsuitable to produce usable articles.

Machine translation is always unsuitable to produce usable articles, but can help to start new ones in smaller wikipedias.

If we want to use machine translation we should try with a free project like Apertium:

http://www.apertium.org/ http://wiki.apertium.org/wiki/Main_Page irc://irc.freenode.net/apertium

-- Fajro

Andreas Kolbe

3:31 p.m.

--- On Sun, 25/7/10, Fajro faigos@gmail.com wrote:

...

Machine translation is always unsuitable to produce usable articles, but can help to start new ones in smaller wikipedias.

I second that. About 50% of machine translation output is gibberish, or worse, plausible-sounding text that actually says the opposite of what the original said. To get it into readable form takes about as long as starting from scratch.

Translation memory software only helps where content is repetitive.

praveenp

28 Jul 28 Jul

3:20 p.m.

Consider Malayalam Language sentense "വിക്കിപീഡിയ ഒരു നല്ല വിജ്ഞാനകോശം ആണ്" means "Wikipedia is a good encyclopedia". How one can understand if a translator picks meaning of Malayalam words and create an English sentence like "wikipedia one good encyclopedia is". Please think about more complex sentences. Sentence structure of Indian languages are completely different from English or European languages. Google's current attempt putting extra weight over tiny communities by pushing them complete rewriting (Easiest way is deletion because some sentence does not make any sense at all). I am not against machine translations but Google must improve their tool or toolkit before trying it over small wikipedias.

On Sunday 25 July 2010 09:01 PM, Andreas Kolbe wrote:

...

--- On Sun, 25/7/10, Fajrofaigos@gmail.com wrote:

...
Machine translation is always unsuitable to produce usable articles, but can help to start new ones in smaller wikipedias.

I second that. About 50% of machine translation output is gibberish, or worse, plausible-sounding text that actually says the opposite of what the original said. To get it into readable form takes about as long as starting from scratch.

Translation memory software only helps where content is repetitive.

A.

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Pedro Sanchez

3:49 p.m.

On Wed, Jul 28, 2010 at 10:20 AM, praveenp me.praveen@gmail.com wrote:

...

Consider Malayalam Language sentense "വിക്കിപീഡിയ ഒരു നല്ല വിജ്ഞാനകോശം ആണ്" means "Wikipedia is a good encyclopedia". How one can understand if a translator picks meaning of Malayalam words and create an English sentence like "wikipedia one good encyclopedia is". Please think about more complex sentences. Sentence structure of Indian languages are completely different from English or European languages. Google's current attempt putting extra weight over tiny communities by pushing them complete rewriting (Easiest way is deletion because some sentence does not make any sense at all). I am not against machine translations but Google must improve their tool or toolkit before trying it over small wikipedias.

Nor google nor the wmf is creating articles automatically via machine translations. Google is not pushing translated articles.

Toolkit is a page where you can see a (sometimes not good) translation, and you (if you want to) are able to complete or fix it.

When you believe it is complete, you upload it to wikipedia, just like you waoult upload a fully manual translation when you consider it's complete.

praveenp

5:40 p.m.

On Wednesday 28 July 2010 09:19 PM, Pedro Sanchez wrote:

...

Nor google nor the wmf is creating articles automatically via machine translations. Google is not pushing translated articles.

Toolkit is a page where you can see a (sometimes not good) translation, and you (if you want to) are able to complete or fix it.

When you believe it is complete, you upload it to wikipedia, just like you waoult upload a fully manual translation when you consider it's complete.

Unfortunately that is not something happening around. It looks like somebody hiring someone and they are creating a database of words. :(

Ragib Hasan

6:50 p.m.

On Wed, Jul 28, 2010 at 1:40 PM, praveenp me.praveen@gmail.com wrote:

...

On Wednesday 28 July 2010 09:19 PM, Pedro Sanchez wrote:

...
Nor google nor the wmf is creating articles automatically via machine translations. Google is not pushing translated articles.

Toolkit is a page where you can see a (sometimes not good) translation, and you (if you want to) are able to complete or fix it.

When you believe it is complete, you upload it to wikipedia, just like you waoult upload a fully manual translation when you consider it's complete.

Unfortunately that is not something happening around. It looks like somebody hiring someone and they are creating a database of words. :(

...

From my experience in Bengali wikipedia, many GTT-assisted edits are

unsalvageable. This is not a fault of GTT per se, but rather a fault of the model Google followed here. Of course GTT does not provide a translation magically, but the translators hired by Google did an awful job of the first draft of translation, and never fixed that. If you show a volunteer a 1 para stub with problems, they are happy to go and fix it. But when you bring a 100 KB full article where every sentence needs fixing, the volunteers just give up. Even seasoned wikipedians are not willing to devote several hours in doing a complete rewrite of the article ... a manual translation from scratch takes a much shorter time.

Of course, last week, one of the translators came back with a much better version of an article, and we allowed the translator to create it in the user space. If the translation passes the community's standards, we will move it to the main namespace. So, we are not completely blocking/banning such paid translations, rather we banned bad, unfixed, unreadable translations and translators that were not willing to fix their problems.

-- Ragib

Jimmy O'Regan

1 Aug 1 Aug

1:19 a.m.

On Sun, 25 Jul 2010 11:04:42 -0300, Fajro wrote:

...

On Sun, Jul 25, 2010 at 8:33 AM, Mark Williamson node.ue@gmail.com wrote:

...
about the toolkit, but I got the impression you're referring to Google Translate, which I agree is always unsuitable to produce usable articles.

Machine translation is always unsuitable to produce usable articles, but can help to start new ones in smaller wikipedias.

Unedited MT is always unsuitable, rather.

...

If we want to use machine translation we should try with a free project like Apertium:

Apertium *is* used to translate Wikipedia articles. The difference is, we concentrate on producing rule-based translators between related languages, where the results can be quite impressive. I wouldn't recommend that anyone use our English-Catalan translator for a Wikipedia article - there will simply be too much work involved in making it readable. Our Spanish-Catalan translator, on the other hand, will do quite a good job of it.

In theory, statistical MT should also be better with related languages (though I haven't seen anyone working on it). Google isn't 'pure' SMT though; much of their resources come from translating via English, so even when there's no ambiguity between two languages, Google will find some based on English.

The quality of translation of an SMT system greatly depends on the type of text it was trained with. Articles relating to computing, law, and medicine will translate much better than, say, articles about history, because those are the types of text for which translations are most widely available.

Mark Williamson

25 Jul 25 Jul

10:12 a.m.

Two things:

1) Please define "junk articles". Do you mean articles that you think nobody in your community wants to read (like, say, an article about an American singer or actor, for example [[Lady Gaga]]), or do you mean articles that are written in such a way as to be incomprehensible, or are filled with linkspam, etc? Or do you mean something else entirely? Please explain. 2) Community is certainly important, but aren't we here to write an encyclopedia? I don't think having all links turned blue is a bad thing at all. In fact, it seems to me that over time, a larger article base will result in more users joining. Note that I said over time; in the short term, it may not have much effect.

-m.

On Sat, Jul 24, 2010 at 11:12 PM, Shiju Alex shijualexonline@gmail.com wrote:

...

Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active community in a wikipedia how can we expect response from communities? If there is no response from a community, does that mean that Google can hire some native speakers and use machine translation to create articles for that wikipedia?

Now let us go back to a basic question. Does WMF require a wiki community to create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the language*.That might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

- will affect the biological growth of a Wikipedia article - will create copy of English wikipedia article in local wikis - it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working example of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will affect the community growth.If there is no active wiki community, how can we expect them to look after all these junk articles uploaded to wiki every day. When all the important article links are already turned blue, how we can expect any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively. But only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin scripts*) to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the points raised by a Swahali Wikipedianhttp://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-the-swahili-wikipedia/. Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Amir E. Aharoni

3:10 p.m.

2010/7/25 Shiju Alex shijualexonline@gmail.com:

...

Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

At the same session at Wikimania a very sensible approach was presented by Mikel Iturbe from the Basque Wikipedia:

* They didn't use Google Translate, but an academically-developed tool, which also happened to be Free Software - which diminished the arguments about commercialization.

* The editors community was involved throughout the whole process.

* Articles were not uploaded without correcting mistakes that the translation software made.

* What's also important, the corrections were reported to the translation software developers, so they would try to improve it.

Of course, not every language community can afford developing Free-as-in-speech academic translation software, but the other points are useful to everybody.

Mikel Iturbe's presentation: * http://www.slideshare.net/janfri/wikimania2010

The academic papers related to that project: * http://ixa.si.ehu.es/openmt2/argitalpenak_html * http://ixa.si.ehu.es/Ixa/Argitalpenak/Artikuluak/index_html?Atala=Artikulua_...

-- אָמִיר אֱלִישָׁע אַהֲרוֹנִי Amir Elisha Aharoni http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore

Jimmy O'Regan

1 Aug 1 Aug

2:30 a.m.

On Sun, 25 Jul 2010 18:10:54 +0300, Amir E. Aharoni wrote:

...

2010/7/25 Shiju Alex shijualexonline@gmail.com:

...
Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

At the same session at Wikimania a very sensible approach was presented by Mikel Iturbe from the Basque Wikipedia:

They didn't use Google Translate, but an academically-developed tool,

which also happened to be Free Software - which diminished the arguments about commercialization.

Probably Matxin (http://sourceforge.net/projects/matxin/)

Matxin is somewhat related to Apertium, which I am involved with. Some Apertium developers tried to make it less Basque-specific, but weren't entirely successful.

...

The editors community was involved throughout the whole process.

Articles were not uploaded without correcting mistakes that the

translation software made.

What's also important, the corrections were reported to the

translation software developers, so they would try to improve it.

Of course, not every language community can afford developing Free-as-in-speech academic translation software, but the other points are useful to everybody.

Depending on the languages involved, the amount of resources available for those languages, and having realistic expectations, a usable system can be made in as little as 3-6 months by a single motivated volunteer, with help from experienced developers. Earlier this year, at the request of Crisis Commons, 3 of us built a Haitian Creole to English prototype in less than a week.

Staying motivated is *hard*. We have 2-3 times as many half-working prototypes as we have released language pairs. Having realistic expectations is hard. People want English, and/or they want to include *everything* (budget at least a year of full time work for anything to English).

If you know the difference between noun, adjective, and verb, understand Zipf's law, and want open source MT for a pair of languages, come find us on #apertium on FreeNode. We'll be happy to help.

...

Mikel Iturbe's presentation:

http://www.slideshare.net/janfri/wikimania2010

The academic papers related to that project: * http://ixa.si.ehu.es/openmt2/argitalpenak_html * http://ixa.si.ehu.es/Ixa/Argitalpenak/Artikuluak/index_html?

Atala=Artikulua_Itzulpen_automatikoa

David Gerard

10:15 a.m.

On 1 August 2010 03:30, Jimmy O'Regan joregan@gmail.com wrote:

...

Depending on the languages involved, the amount of resources available for those languages, and having realistic expectations, a usable system can be made in as little as 3-6 months by a single motivated volunteer, with help from experienced developers. Earlier this year, at the request of Crisis Commons, 3 of us built a Haitian Creole to English prototype in less than a week. Staying motivated is *hard*. We have 2-3 times as many half-working prototypes as we have released language pairs. Having realistic expectations is hard. People want English, and/or they want to include *everything* (budget at least a year of full time work for anything to English). If you know the difference between noun, adjective, and verb, understand Zipf's law, and want open source MT for a pair of languages, come find us on #apertium on FreeNode. We'll be happy to help.

Hmm. This sounds to me like something it would be on-mission for WMF to fund a developer for.

Of course, there's lots of other quick wins we could achieve with paid developer time. But this strikes me as something to keep in mind.

- d.

Nikola Smolenski

25 Jul 25 Jul

6:20 p.m.

Дана Sunday 25 July 2010 08:12:43 Shiju Alex написа:

...

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

I was thinking about a website that would have static copies of all Wikipedia articles translated to all languages. That should dissuade people from using Google Translate to make Wikipedia articles, since the articles would already be online; and even if someone would do that, admins would have community support for deletion of such articles because they already exist online. And if someone would want to fix Google Translate translation and make a real article, they could do that too...

Ragib Hasan

27 Jul 27 Jul

11:38 p.m.

As an admin in Bengali wikipedia, I had to deal with this issue a lot (some of which were discussed with the Telegraph (India) newspaper article). But I'd like to elaborate our stance here:

(The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.)

Issues: 1. Community involvement: First of all, the local community was not at all involved or informed about this project. All on a sudden, we found new users signing up, dropping a large article on a random topic, and move away. These users never responded to any talk page messages, so we first assumed these were just random users experimenting with wikipedia.

Even now, no one from Google has contacted us in Bengali wikipedia and inform us about Google's intentions. This is not a problem by itself, but see the following points.

2. Translation quality: The quality of the translations was awful. The translations added to Bengali wikipedia were artificial, dry, and used obscure words and phrases. It looked as if a non-native speaker sat down with a dictionary in hand, and mechanically translated each sentence word by word. That led to sentences which are hard to understand, or downright nonsensical.

The articles were half-done. Numerals were not translated at all. The punctuation symbol for Bengali language (the "danda" symbol: । ) was not used. (apparently, GTT and/or the google transliteration tool does not support that).

The articles were also full of spelling mistakes. The paid translator misspelled many simple words, or even used different spellings for the same word in different parts of the article.

Finally, different languages have different sentence structures. Sometimes, a complex sentence is better expressed if broken up in two sentences in another language. We found that the translators simply translated sentences preserving their English language structure. This caused the resulting Bengali sentences awkward and artificial to read. For example, we do not write "If x then y" in Bengali just by replacing if and then with the corresponding Bengali words. But the translators did that, apparently this is an artifact of using GTT.

3. Lack of follow up: When we found the above problems, naturally, we asked the contributor to fix them. Got no reply. It is NOT the task of volunteers to clean up the mess after the one-night-standish paid translators. Given the small number of volunteers active at any given moment, it will take enormous efforts in our part to go through these articles and fix the punctuation, spelling, and grammar issues. Not to mention the awkward language style used by the translators.

So, after getting a cold shoulder from the paid translators about fixing their mess, we had to ban such edits outright. We didn't know who was behind this, until the Wikimania talk from Google. Not that it matters ... even now, we won't allow these half done and badly translated articles on bengali wikipedia.

Bengali wikipedia is small (21k articles), but we do not want to populate it overnight with badly translated content, some of which won't even qualify as grammatically correct Bengali. While wikipedia may be a perpetual work in progress, that does not mean we need to be guinea-pigs of some careless experiments. So, our stance is, "Thanks, but NO Thanks!". Unless, of course, they can put enough commitment into the translations and fix mistakes.

We welcome automation in translation, but not at the expense of introducing incorrect and messy content on wikipedia. We'd rather stay small and hand-craft than allow an experimental tool and unskilled paid translators creating a big mess.

Thanks

Ragib (User:Ragib on en and bn)

-- Ragib Hasan, Ph.D NSF Computing Innovation Fellow and Assistant Research Scientist

Dept of Computer Science Johns Hopkins University 3400 N Charles Street Baltimore, MD 21218

Website: http://www.ragibhasan.com

On Sun, Jul 25, 2010 at 2:12 AM, Shiju Alex shijualexonline@gmail.com wrote:

...

Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias. The foundation also seems like approved the efforts of Google. But I am not sure whether any one is interested to consult the respective language community to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active community in a wikipedia how can we expect response from communities? If there is no response from a community, does that mean that Google can hire some native speakers and use machine translation to create articles for that wikipedia?

Now let us go back to a basic question. Does WMF require a wiki community to create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the language*.That might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

- will affect the biological growth of a Wikipedia article - will create copy of English wikipedia article in local wikis - it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working example of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will affect the community growth.If there is no active wiki community, how can we expect them to look after all these junk articles uploaded to wiki every day. When all the important article links are already turned blue, how we can expect any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively. But only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin scripts*) to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the points raised by a Swahali Wikipedianhttp://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-the-swahili-wikipedia/. Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Fajro

28 Jul 28 Jul

12:43 a.m.

On Tue, Jul 27, 2010 at 8:38 PM, Ragib Hasan ragibhasan@gmail.com wrote:

...

(The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.)

Another issue: The resulting translation memory is not free.

-- Fajro

Ragib Hasan

1:18 a.m.

On Tue, Jul 27, 2010 at 8:43 PM, Fajro faigos@gmail.com wrote:

...

On Tue, Jul 27, 2010 at 8:38 PM, Ragib Hasan ragibhasan@gmail.com wrote:

...
(The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.)

Another issue: The resulting translation memory is not free.

-- Fajro

My guess is that, the translation memory will be used in enhancing Google Translate (the automated translator). That is probably a reason behind creating these translations in the first place.

(See http://en.wikipedia.org/wiki/Google_Translate : "According to Och, a solid base for developing a usable statistical machine translation system for a new pair of languages from scratch, would consist in having a bilingual text corpus (or parallel collection) of more than a million words and two monolingual corpora of each more than a billion words")

Ragib

Ray Saintonge

8:09 p.m.

Fajro wrote:

...

On Tue, Jul 27, 2010 at 8:38 PM, Ragib Hasan ragibhasan@gmail.com wrote:

...
(The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.)

Another issue: The resulting translation memory is not free

This is a red herring. Some real and important issues have been raised about machine translations, but this is not one of them.

The fact that the source codes for the translation processes are not free does not make the results of such machine translations unfree. Key to anything being copyright is that material must be original and not the result of a mechanical process. Machine translations are mechanical processes. Another person using the same software with the same text should have the same results.

It is also important that the allegedly infringing text must have been fixed in some medium. A person issuing a take down order must show, as an necessary element of that order, where the material in question was previously published. Two identical texts by different authors need not be copies of each other. With human efforts two such identical texts are highly improbable, but this need not be the case with machine translation. Indeed if the same software keeps producing different results I would question its reliability.

Ray

Mark Williamson

29 Jul 29 Jul

2:09 a.m.

I'm not sure that's exactly the question. Rather, by using GTTK, people are contributing to building [[Translation memory]] for Google, which they can in turn use to build their statistical models. It's not that we're using non-free software, but rather that we're contributing to it.

-m.

On Wed, Jul 28, 2010 at 1:09 PM, Ray Saintonge saintonge@telus.net wrote:

...

Fajro wrote:

...
On Tue, Jul 27, 2010 at 8:38 PM, Ragib Hasan ragibhasan@gmail.com wrote:

...
(The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.)

Another issue: The resulting translation memory is not free

This is a red herring. Some real and important issues have been raised about machine translations, but this is not one of them.

The fact that the source codes for the translation processes are not free does not make the results of such machine translations unfree. Key to anything being copyright is that material must be original and not the result of a mechanical process. Machine translations are mechanical processes. Another person using the same software with the same text should have the same results.

It is also important that the allegedly infringing text must have been fixed in some medium. A person issuing a take down order must show, as an necessary element of that order, where the material in question was previously published. Two identical texts by different authors need not be copies of each other. With human efforts two such identical texts are highly improbable, but this need not be the case with machine translation. Indeed if the same software keeps producing different results I would question its reliability.

Ray

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Shiju Alex

28 Jul 28 Jul

9:40 a.m.

...

We welcome automation in translation, but not at the expense of introducing incorrect and messy content on wikipedia. We'd rather stay small and hand-craft than allow an experimental tool and unskilled paid translators creating a big mess.

Yes. This is the answer that you will get from most of the active wiki ((small wikis) communities where this project is going on. Many of the small wiki communities are not worried about the numbers as some big wikipedias do. Quality is more important for small wikis when number of contributors are less. *Many of us will use this quality matrix* itself to bring in more people.

My real concern is about the rift that is happening in a language community due to this project. Issues of a language wiki is taken outside wiki to prove some points against its contributors. Two types are communities are evolving out of this project. *Google's Wiki community* and *Wiki's wiki community*. :) This is really annoying as far as small wikis are concerned.

So, some sort of intervention is required to make sure this project run smootly on different wiikipedias.

~Shiju

On Wed, Jul 28, 2010 at 1:38 AM, Ragib Hasan ragibhasan@gmail.com wrote:

...

As an admin in Bengali wikipedia, I had to deal with this issue a lot (some of which were discussed with the Telegraph (India) newspaper article). But I'd like to elaborate our stance here:

(The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.)

Issues:

Community involvement: First of all, the local community was not at

all involved or informed about this project. All on a sudden, we found new users signing up, dropping a large article on a random topic, and move away. These users never responded to any talk page messages, so we first assumed these were just random users experimenting with wikipedia.

Even now, no one from Google has contacted us in Bengali wikipedia and inform us about Google's intentions. This is not a problem by itself, but see the following points.

Translation quality: The quality of the translations was awful. The

translations added to Bengali wikipedia were artificial, dry, and used obscure words and phrases. It looked as if a non-native speaker sat down with a dictionary in hand, and mechanically translated each sentence word by word. That led to sentences which are hard to understand, or downright nonsensical.

The articles were half-done. Numerals were not translated at all. The punctuation symbol for Bengali language (the "danda" symbol: । ) was not used. (apparently, GTT and/or the google transliteration tool does not support that).

The articles were also full of spelling mistakes. The paid translator misspelled many simple words, or even used different spellings for the same word in different parts of the article.

Finally, different languages have different sentence structures. Sometimes, a complex sentence is better expressed if broken up in two sentences in another language. We found that the translators simply translated sentences preserving their English language structure. This caused the resulting Bengali sentences awkward and artificial to read. For example, we do not write "If x then y" in Bengali just by replacing if and then with the corresponding Bengali words. But the translators did that, apparently this is an artifact of using GTT.

Lack of follow up: When we found the above problems, naturally, we

asked the contributor to fix them. Got no reply. It is NOT the task of volunteers to clean up the mess after the one-night-standish paid translators. Given the small number of volunteers active at any given moment, it will take enormous efforts in our part to go through these articles and fix the punctuation, spelling, and grammar issues. Not to mention the awkward language style used by the translators.

So, after getting a cold shoulder from the paid translators about fixing their mess, we had to ban such edits outright. We didn't know who was behind this, until the Wikimania talk from Google. Not that it matters ... even now, we won't allow these half done and badly translated articles on bengali wikipedia.

Bengali wikipedia is small (21k articles), but we do not want to populate it overnight with badly translated content, some of which won't even qualify as grammatically correct Bengali. While wikipedia may be a perpetual work in progress, that does not mean we need to be guinea-pigs of some careless experiments. So, our stance is, "Thanks, but NO Thanks!". Unless, of course, they can put enough commitment into the translations and fix mistakes.

We welcome automation in translation, but not at the expense of introducing incorrect and messy content on wikipedia. We'd rather stay small and hand-craft than allow an experimental tool and unskilled paid translators creating a big mess.

Thanks

Ragib (User:Ragib on en and bn)

-- Ragib Hasan, Ph.D NSF Computing Innovation Fellow and Assistant Research Scientist

Dept of Computer Science Johns Hopkins University 3400 N Charles Street Baltimore, MD 21218

Website: http://www.ragibhasan.com

On Sun, Jul 25, 2010 at 2:12 AM, Shiju Alex shijualexonline@gmail.com wrote:

...
Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias.

The

...
foundation also seems like approved the efforts of Google. But I am not

sure

...
whether any one is interested to consult the respective language

community

...
to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active

community

...
in a wikipedia how can we expect response from communities? If there is

no

...
response from a community, does that mean that Google can hire some

native

...
speakers and use machine translation to create articles for that

wikipedia?

...
Now let us go back to a basic question. Does WMF require a wiki community

to

...
create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the

language*.That

...
might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

will affect the biological growth of a Wikipedia article

will create copy of English wikipedia article in local wikis

it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working

example

...
of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will

affect

...
the community growth.If there is no active wiki community, how can we

expect

...
them to look after all these junk articles uploaded to wiki every day.

When

...
all the important article links are already turned blue, how we can

expect

...
any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively.

But

...
only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin

scripts*)

...
to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the

points

...
raised by a Swahali Wikipedian<

http://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-th...

...
. Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Ziko van Dijk

1:42 p.m.

Dear colleagues,

My experiences with the Translate Kit are negative, too. It happened just too often that a sentence was so twisted that I did not understand it. Checking it with the original took me a lot of time, so I decided that doing the translation by myself is much quicker and reliable. It is good for nobody to read Wikipedia articles in gibberish. The idea that the translation tool is doing the work and that a human being has to make just some little corrections, has simply failed. Especially negative was, to me, that the Translator kit encourages you to translate sentence by sentence. I don't want to do injustice to anyone, but in my view there are two groups of Wikipedians: - those who want to see huge article numbers and believe that any article with any content is good, in any quality, and that the Wikipedians are sufficient to do the rest. - those who believe that (at least a minimum) quality is important and that articles below a certain niveau do damage to a Wikipedia. The small numbers of Wikipedians cannot cope with the work. They welcome not any content, but content that meets the possible interests of their readers. It seems to me that the first group is mainly populated by computer specialists and natives of English. The second group consists of language specialists and non natives of English. But of course there are many exceptions.

Kind regards Ziko van Dijk

2010/7/28 Shiju Alex shijualexonline@gmail.com:

...

...
We welcome automation in translation, but not at the expense of introducing incorrect and messy content on wikipedia. We'd rather stay small and hand-craft than allow an experimental tool and unskilled paid translators creating a big mess.

Yes. This is the answer that you will get from most of the active wiki ((small wikis) communities where this project is going on. Many of the small wiki communities are not worried about the numbers as some big wikipedias do. Quality is more important for small wikis when number of contributors are less. *Many of us will use this quality matrix* itself to bring in more people.

My real concern is about the rift that is happening in a language community due to this project. Issues of a language wiki is taken outside wiki to prove some points against its contributors. Two types are communities are evolving out of this project. *Google's Wiki community* and *Wiki's wiki community*. :) This is really annoying as far as small wikis are concerned.

So, some sort of intervention is required to make sure this project run smootly on different wiikipedias.

~Shiju

On Wed, Jul 28, 2010 at 1:38 AM, Ragib Hasan ragibhasan@gmail.com wrote:

...
As an admin in Bengali wikipedia, I had to deal with this issue a lot (some of which were discussed with the Telegraph (India) newspaper article). But I'd like to elaborate our stance here:

(The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.)

Issues:

Community involvement: First of all, the local community was not at

all involved or informed about this project. All on a sudden, we found new users signing up, dropping a large article on a random topic, and move away. These users never responded to any talk page messages, so we first assumed these were just random users experimenting with wikipedia.

Even now, no one from Google has contacted us in Bengali wikipedia and inform us about Google's intentions. This is not a problem by itself, but see the following points.

Translation quality: The quality of the translations was awful. The

translations added to Bengali wikipedia were artificial, dry, and used obscure words and phrases. It looked as if a non-native speaker sat down with a dictionary in hand, and mechanically translated each sentence word by word. That led to sentences which are hard to understand, or downright nonsensical.

The articles were half-done. Numerals were not translated at all. The punctuation symbol for Bengali language (the "danda" symbol: । ) was not used. (apparently, GTT and/or the google transliteration tool does not support that).

The articles were also full of spelling mistakes. The paid translator misspelled many simple words, or even used different spellings for the same word in different parts of the article.

Finally, different languages have different sentence structures. Sometimes, a complex sentence is better expressed if broken up in two sentences in another language. We found that the translators simply translated sentences preserving their English language structure. This caused the resulting Bengali sentences awkward and artificial to read. For example, we do not write "If x then y" in Bengali just by replacing if and then with the corresponding Bengali words. But the translators did that, apparently this is an artifact of using GTT.

Lack of follow up: When we found the above problems, naturally, we

asked the contributor to fix them. Got no reply. It is NOT the task of volunteers to clean up the mess after the one-night-standish paid translators. Given the small number of volunteers active at any given moment, it will take enormous efforts in our part to go through these articles and fix the punctuation, spelling, and grammar issues. Not to mention the awkward language style used by the translators.

So, after getting a cold shoulder from the paid translators about fixing their mess, we had to ban such edits outright. We didn't know who was behind this, until the Wikimania talk from Google. Not that it matters ... even now, we won't allow these half done and badly translated articles on bengali wikipedia.

Bengali wikipedia is small (21k articles), but we do not want to populate it overnight with badly translated content, some of which won't even qualify as grammatically correct Bengali. While wikipedia may be a perpetual work in progress, that does not mean we need to be guinea-pigs of some careless experiments. So, our stance is, "Thanks, but NO Thanks!". Unless, of course, they can put enough commitment into the translations and fix mistakes.

We welcome automation in translation, but not at the expense of introducing incorrect and messy content on wikipedia. We'd rather stay small and hand-craft than allow an experimental tool and unskilled paid translators creating a big mess.

Thanks

Ragib (User:Ragib on en and bn)

-- Ragib Hasan, Ph.D NSF Computing Innovation Fellow and Assistant Research Scientist

Dept of Computer Science Johns Hopkins University 3400 N Charles Street Baltimore, MD 21218

Website: http://www.ragibhasan.com

On Sun, Jul 25, 2010 at 2:12 AM, Shiju Alex shijualexonline@gmail.com wrote:

...
Hello All,

Recently there are lot of discussions (in this list also) regarding the translation project by Google for some of the big language wikipedias.

The

...
foundation also seems like approved the efforts of Google. But I am not

sure

...
whether any one is interested to consult the respective language

community

...
to know their views.

As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised their concerns about Google's project. But, does this means that other communities are happy about Google efforts? If there is no active

community

...
in a wikipedia how can we expect response from communities? If there is

no

...
response from a community, does that mean that Google can hire some

native

...
speakers and use machine translation to create articles for that

wikipedia?

...
Now let us go back to a basic question. Does WMF require a wiki community

to

...
create wikipedia in any language? Or can they utilize the services of companies like Google to create wikipedias in N number of languages?

One of the main point raised by the supporters of Google translation is that, Google's project is good *for the online version of the

language*.That

...
might be true. But no body is cared to verify whether it is good for Wikipedia.

As pointed out by Ravi in his presentation in Wikimania, ( http://docs.google.com/present/view?id=ddpg3qwc_279ghm7kbhs), the Google translation of wikipedia articles:

- will affect the biological growth of a Wikipedia article - will create copy of English wikipedia article in local wikis - it is against some of the basic philosophies of wikipedia

The people outside wiki will definitely benefit from this tool, if Google translation tool is developed for each language. I saw the working

example

...
of this in Poland during Wikimania, when some people who are not good in English used google translator to communicate with us. :)

Apart from the points raised by Ravi in his presentation, this will

affect

...
the community growth.If there is no active wiki community, how can we

expect

...
them to look after all these junk articles uploaded to wiki every day.

When

...
all the important article links are already turned blue, how we can

expect

...
any future potential editors. So according to me, Google's project is killing the growth of an active wiki community.

Of course, Tamil Wikipedia is trying to use Google project effectively.

But

...
only Tamil is doing that since they have an active wiki community*. Many Wiki communities are not even aware that such a project is happening in their wiki*.

I do not want to point out specific language wikipedas to prove my point. But visit the wikipedias (especially wikipedias* that use non-latin

scripts*)

...
to view the status of google translation project. Loads of junk articles are uploaded to wiki every day. Most of the time the only edit in these articles is the edit by its creator and the inter language wiki bots.

This effort will definitely affect community growth. Kindly see the

points

...
raised by a Swahali Wikipedian<

http://muddybtz.blog.com/2010/07/16/what-happened-on-the-google-challenge-th...

...
. Many Swahali users (and other language users) now expect a laptop or some other monitory benefits to write in their wikipedia. That affects the community growth.

So what is the solution for this? Can we take lessons from Tamil/Bengali/Swahili wikipedias and find methods to use this service effectively or continue with the current article creation process.

One last question. Is this tool that is developing by Google is an open source tool? If not, we need to answer so many questions that may follow.

Regards

Shiju Alex http://en.wikipedia.org/wiki/User:Shijualex _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Ziko van Dijk Niederlande

Nathan

1:44 p.m.

Just to be sure I understand... What's happening here is that human beings, using a software tool, are translating articles from the English Wikipedia into a variety of other languages and posting them on the comparatively small Wikipedia projects in these languages. The articles, of unknown intrinsic quality, are usually mid to low quality translations.

In the projects with an active community, some have rejected these articles because they are not high quality and because the community refuses to be responsible for fixing punctuation and other errors made by editors who are not members of the community. In the projects without an active community, Wikimedians (who may not speak any of the languages affected by the Google initiative) are objecting for a variety of other reasons - because the software used to assist translation isn't free, because the effort is managed by a commercial organization or because the endeavor wasn't cleared with the Wikimedia community first. Some are also concerned that these new articles will somehow deter new editors from becoming involved, despite clear evidence that a larger base of content attracts more readers, and more readers plus imperfect content leads to more editors.

What I find interesting is that few seem to be interested in keeping or improving the translated articles; Google's attempt to provide content in under-served languages is actually offending Wikimedians, despite our ostensible commitment to the same goal. Concerns like bureaucratic pre-approval, using free software, etc. are somehow more important than reaching more people with more content. It all seems strange and un-Wikimedian like to me. Obviously there are things Google should have done differently. Maybe working with them to improve their process should be the focus here?

Ziko van Dijk

2:18 p.m.

2010/7/28 Nathan nawrich@gmail.com:

...

Just to be sure I understand...

It's good that you ask, indeed. :-)

No, it's not about free software, and the Wikimedians are not too snobby or lazy to correct poor language. That is what I frequently do in de.WP and eo.WP, and I suppose Ragib and many others as well. The point is: The machine translated articles are often so bad that I simply don't understand them. I *cannot* correct them, because I don't know what they are saying.

Kind regards Ziko

What's happening here is that human

...

beings, using a software tool, are translating articles from the English Wikipedia into a variety of other languages and posting them on the comparatively small Wikipedia projects in these languages. The articles, of unknown intrinsic quality, are usually mid to low quality translations.

In the projects with an active community, some have rejected these articles because they are not high quality and because the community refuses to be responsible for fixing punctuation and other errors made by editors who are not members of the community. In the projects without an active community, Wikimedians (who may not speak any of the languages affected by the Google initiative) are objecting for a variety of other reasons - because the software used to assist translation isn't free, because the effort is managed by a commercial organization or because the endeavor wasn't cleared with the Wikimedia community first. Some are also concerned that these new articles will somehow deter new editors from becoming involved, despite clear evidence that a larger base of content attracts more readers, and more readers plus imperfect content leads to more editors.

What I find interesting is that few seem to be interested in keeping or improving the translated articles; Google's attempt to provide content in under-served languages is actually offending Wikimedians, despite our ostensible commitment to the same goal. Concerns like bureaucratic pre-approval, using free software, etc. are somehow more important than reaching more people with more content. It all seems strange and un-Wikimedian like to me. Obviously there are things Google should have done differently. Maybe working with them to improve their process should be the focus here?

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Ziko van Dijk Niederlande

Mark Williamson

3:55 p.m.

Ziko, again, we are not talking about machine translations; Google doesn't have machine translation for Bangla, Malayalam, Tamil etc. yet. This is about translation memory.

One of the things about MAT, whose use in the professional translator community is still debated but most popular for translations of time-dependent things like news, is that the original is often a very rough translation that requires a _lot_ of editing. The biggest problem is not the toolkit itself (with some exceptions - punctuation and templates, for example) but the translators who do not bother to use it properly, creating poor translations with lots of spelling mistakes and leaving behind a wasteland of poor quality articles.

GTTK can be used as a force of good if someone puts in the appropriate time and effort; when used _properly_ by a careful, knowledgeable translator who gives ample time for proofreading, articles created with it should be virtually indistinguishable from any other article.

It is my thought that the huge problem here is lack of engagement with communities. Essentially, Google swooped down and started dropping large amounts of poor quality content on our projects without engaging the people from those communities. The people in Google's contest also didn't engage the communities, nor did they respond to requests to improve their content.

-m.

On Wed, Jul 28, 2010 at 7:18 AM, Ziko van Dijk zvandijk@googlemail.com wrote:

...

2010/7/28 Nathan nawrich@gmail.com:

...
Just to be sure I understand...

It's good that you ask, indeed. :-)

No, it's not about free software, and the Wikimedians are not too snobby or lazy to correct poor language. That is what I frequently do in de.WP and eo.WP, and I suppose Ragib and many others as well. The point is: The machine translated articles are often so bad that I simply don't understand them. I *cannot* correct them, because I don't know what they are saying.

Kind regards Ziko

What's happening here is that human

...
beings, using a software tool, are translating articles from the English Wikipedia into a variety of other languages and posting them on the comparatively small Wikipedia projects in these languages. The articles, of unknown intrinsic quality, are usually mid to low quality translations.

In the projects with an active community, some have rejected these articles because they are not high quality and because the community refuses to be responsible for fixing punctuation and other errors made by editors who are not members of the community. In the projects without an active community, Wikimedians (who may not speak any of the languages affected by the Google initiative) are objecting for a variety of other reasons - because the software used to assist translation isn't free, because the effort is managed by a commercial organization or because the endeavor wasn't cleared with the Wikimedia community first. Some are also concerned that these new articles will somehow deter new editors from becoming involved, despite clear evidence that a larger base of content attracts more readers, and more readers plus imperfect content leads to more editors.

What I find interesting is that few seem to be interested in keeping or improving the translated articles; Google's attempt to provide content in under-served languages is actually offending Wikimedians, despite our ostensible commitment to the same goal. Concerns like bureaucratic pre-approval, using free software, etc. are somehow more important than reaching more people with more content. It all seems strange and un-Wikimedian like to me. Obviously there are things Google should have done differently. Maybe working with them to improve their process should be the focus here?

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Ziko van Dijk Niederlande

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Ziko van Dijk

4:10 p.m.

Mark Williamson:

...

GTTK can be used as a force of good if someone puts in the appropriate time and effort; when used _properly_ by a careful, knowledgeable

...

It is my thought that the huge problem here is lack of engagement with communities. Essentially, Google swooped down and started dropping

Agreed. Again, in my experience it is quicker and delivers more quality to translate by your own. If others have different experiences (it may depend on the language), okay. It seems that something went very wrong when telling people who to contribute to a Wikipedia language version. Could you report more about that, Mark?

Kind regards Ziko

-- Ziko van Dijk Niederlande

Mark Williamson

29 Jul 29 Jul

1:56 a.m.

Well, my impression, and I'm by no means an expert in this (I'm not associated with Google), is that they emphasized quantity over quality and forgot to mention the importance of community to our projects.

I heard that for the Swahili Wikipedia contest at least, they gave away prizes... but perhaps they should've included a requirement that the articles they created be rated as "good" by the community, not full of errors and nonsense sentences, and that all project participants who want any chance at winning must respond to all talkpage messages within 72 hours (or something like that).

I think telling a group of newbies that they'll get a big prize if they translate the most articles is a recipe for disaster. What incentive do they have to make sure their translation is of good quality? What incentive do they have to stick around afterwards?

-m.

On Wed, Jul 28, 2010 at 9:10 AM, Ziko van Dijk zvandijk@googlemail.com wrote:

...

Mark Williamson:

...
GTTK can be used as a force of good if someone puts in the appropriate time and effort; when used _properly_ by a careful, knowledgeable

...
It is my thought that the huge problem here is lack of engagement with communities. Essentially, Google swooped down and started dropping

Agreed. Again, in my experience it is quicker and delivers more quality to translate by your own. If others have different experiences (it may depend on the language), okay. It seems that something went very wrong when telling people who to contribute to a Wikipedia language version. Could you report more about that, Mark?

Kind regards Ziko

-- Ziko van Dijk Niederlande

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Muhammad Yahia

7:07 a.m.

On Wed, Jul 28, 2010 at 6:56 PM, Mark Williamson node.ue@gmail.com wrote:

...

I heard that for the Swahili Wikipedia contest at least, they gave away prizes... but perhaps they should've included a requirement that the articles they created be rated as "good" by the community, not full of errors and nonsense sentences, and that all project participants who want any chance at winning must respond to all talkpage messages within 72 hours (or something like that).

I have been involved with 2 big pushes by Google in the Arabic Wikipedia, one of them was by professional paid translators, the other was done completely by a volunteer organization in collaboration with Google. I supported both efforts heavily. In the latter, they recruited university students mostly to do the work and there was very little to earn beyond recognition. All the problems mentioned above plagued both efforts, and while the second one had slightly better results than the first, the vast amount of translated articles lay ignored in the user space (that's what the consensus on ar.wp was, confine them to their user space until deemed good), the efforts to contact and teach either the volunteers or the paid translators were futile, and the articles had some very awkward sentence structures, some very bad jargon translation, etc.

I have reached the opinion that the gradual nature of collaboration in Wikipedia is what makes our good and excellent articles what they are. I think a very little percent of wikipedians started by writing a full length article, instead most of us started by a small edit in another article, and a bigger edit after it and so on. By the time we began writing whole articles, we had enough knowledge of the community and the wiki syntax to produce good results. Whenever someone has a question about terminology, it gets discussed on the VP, whenever someone is unsure, he recruits other people to review or help. This was all missing from the effort and I think what caused most of the problems.

-- Best Regards, Muhammad Yahia

Ziko van Dijk

7:37 a.m.

Has anybody more information about what Google exactly told the people? A link? To whom was this call for participation directed? This issue "Translation memory" is another problem, another divergency of interests. We Wikipedians want to write good articles in our languages, that often means that we do not translate 1:1 but shorten and customize. But Google wants 1:1 translations for its Translation memory. And, of course, its the big numbers Google is interested in to achieve better automatic translations in the end. Ziko

2010/7/29 Muhammad Yahia shipmaster@gmail.com:

...

On Wed, Jul 28, 2010 at 6:56 PM, Mark Williamson node.ue@gmail.com wrote:

...
I heard that for the Swahili Wikipedia contest at least, they gave away prizes... but perhaps they should've included a requirement that the articles they created be rated as "good" by the community, not full of errors and nonsense sentences, and that all project participants who want any chance at winning must respond to all talkpage messages within 72 hours (or something like that).

I have been involved with 2 big pushes by Google in the Arabic Wikipedia, one of them was by professional paid translators, the other was done completely by a volunteer organization in collaboration with Google. I supported both efforts heavily. In the latter, they recruited university students mostly to do the work and there was very little to earn beyond recognition. All the problems mentioned above plagued both efforts, and while the second one had slightly better results than the first, the vast amount of translated articles lay ignored in the user space (that's what the consensus on ar.wp was, confine them to their user space until deemed good), the efforts to contact and teach either the volunteers or the paid translators were futile, and the articles had some very awkward sentence structures, some very bad jargon translation, etc.

I have reached the opinion that the gradual nature of collaboration in Wikipedia is what makes our good and excellent articles what they are. I think a very little percent of wikipedians started by writing a full length article, instead most of us started by a small edit in another article, and a bigger edit after it and so on. By the time we began writing whole articles, we had enough knowledge of the community and the wiki syntax to produce good results. Whenever someone has a question about terminology, it gets discussed on the VP, whenever someone is unsure, he recruits other people to review or help. This was all missing from the effort and I think what caused most of the problems.

-- Best Regards, Muhammad Yahia _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Ziko van Dijk Niederlande

Mark Williamson

9:12 a.m.

That's absolutely a problem that should not be overlooked. Despite what I said in the other thread about content equivalency across languages, I think this is quite a different issue. A competent translator must take into account context and fluency, and often direct translations do not fit, even when they're grammatically correct. Language is a living organism consisting of more than just words and grammatical rules, we use lots of idioms and turns of phrase that are unique to our languages (or even our local dialects). Ignoring these things in a translation can generally give us output that is understandable, but not necessarily "good" - it can come out sounding stilted, awkward and contrived, at best.

The latest version of GTTK allows the merger of segments, i.e. two sentences in the original can be merged into one and translated accordingly. However, I think it's important to not lose sight of the fact that GTTK is just that: a toolkit. It is not the end-all solution for article creation on any Wiki, nor is it an evil entity that goes around dumping poor-quality text on our projects. It is what we make it - I can use GTTK to produce a translation that is a good, prosaic article if I am willing to put in the time and effort to adapt text from one language to another, which is really the job of the translator anyhow.

This doesn't take away from the problem raised by M. Yahia about community, but I do wonder about that. Informational cannibalism has been common in our community between languages for a long time, ranging from borrowed parts of articles to translations of full articles. This hasn't seemed to be a problem in the past, before GTTK. What struck me were the phrases "very bad sentence structures" and "bad jargon translations". Aren't we talking about professional translators here, people who do this for a living? An excellent translator should not only know their source language well, they must also be intimately familiar with the ins and outs of their target language, beyond just the fact of being a native speaker. If a translation doesn't sound natural in the target language, that's not because it's a translation, it's because either 1) it's a poor translation or 2) it wasn't natural sounding in the source language, either! In most cases, I'd guess 1) since as a translator I'd rather compensate for people's grammatical mistakes than attempt to re-render them in another language.

The key to a great finished translation, in my opinion, is good proofreading. Before proofreading, your translation is like a block of unfinished wood. Rough, but still suitable for some uses. After proofreading, it should be polished. A good translation should leave the reader unable to tell whether the text was translated or if it was originally written in the target language, with only very rare exceptions.

-m.

On Thu, Jul 29, 2010 at 12:37 AM, Ziko van Dijk zvandijk@googlemail.com wrote:

...

Has anybody more information about what Google exactly told the people? A link? To whom was this call for participation directed? This issue "Translation memory" is another problem, another divergency of interests. We Wikipedians want to write good articles in our languages, that often means that we do not translate 1:1 but shorten and customize. But Google wants 1:1 translations for its Translation memory. And, of course, its the big numbers Google is interested in to achieve better automatic translations in the end. Ziko

2010/7/29 Muhammad Yahia shipmaster@gmail.com:

...
On Wed, Jul 28, 2010 at 6:56 PM, Mark Williamson node.ue@gmail.com wrote:

...
I heard that for the Swahili Wikipedia contest at least, they gave away prizes... but perhaps they should've included a requirement that the articles they created be rated as "good" by the community, not full of errors and nonsense sentences, and that all project participants who want any chance at winning must respond to all talkpage messages within 72 hours (or something like that).

I have been involved with 2 big pushes by Google in the Arabic Wikipedia, one of them was by professional paid translators, the other was done completely by a volunteer organization in collaboration with Google. I supported both efforts heavily. In the latter, they recruited university students mostly to do the work and there was very little to earn beyond recognition. All the problems mentioned above plagued both efforts, and while the second one had slightly better results than the first, the vast amount of translated articles lay ignored in the user space (that's what the consensus on ar.wp was, confine them to their user space until deemed good), the efforts to contact and teach either the volunteers or the paid translators were futile, and the articles had some very awkward sentence structures, some very bad jargon translation, etc.

I have reached the opinion that the gradual nature of collaboration in Wikipedia is what makes our good and excellent articles what they are. I think a very little percent of wikipedians started by writing a full length article, instead most of us started by a small edit in another article, and a bigger edit after it and so on. By the time we began writing whole articles, we had enough knowledge of the community and the wiki syntax to produce good results. Whenever someone has a question about terminology, it gets discussed on the VP, whenever someone is unsure, he recruits other people to review or help. This was all missing from the effort and I think what caused most of the problems.

-- Best Regards, Muhammad Yahia _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Ziko van Dijk Niederlande

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Muhammad Yahia

6:29 p.m.

...

What struck me were the phrases "very bad sentence structures" and "bad jargon translations". Aren't we talking about professional translators here, people who do this for a living?

That is an interesting question, I don't have an answer for it. Maybe because GTTK already 'proposes' a sentence structure they try to work within that instead of deleting the whole paragraph (which is often what is needed) and rewriting it. Same for jargon and terminology, since they are not subject matter experts, I am assuming they just accepted GTTK's suggestion.

-- Best Regards, Muhammad Yahia

5240

Age (days ago)

5247

Last active (days ago)

wikimedia-l@lists.wikimedia.org

38 comments

18 participants

tags (0)

participants (18)

Amir E. Aharoni
Andreas Kolbe
Aphaia
David Gerard
Fajro
Jimmy O'Regan
Jon Davis
Mark Williamson
Muhammad Yahia
Nathan
Nikola Smolenski
Pedro Sanchez
praveenp
Przykuta
Ragib Hasan
Ray Saintonge
Shiju Alex
Ziko van Dijk