[Foundation-l] Google Translate now assists with human translations of Wikipedia articles

Amir E. Aharoni amir.aharoni at gmail.com
Wed Jun 10 11:08:38 UTC 2009

On Tue, Jun 9, 2009 at 23:42, Brian<Brian.Mingus at colorado.edu> wrote:
> Google has built in support for using its machine translation technology to
> help bootstrap human translations of Wikipedia articles.
> http://translate.google.com/toolkit/docupload
> The benefit to Google is clear - they need sentence-aligned text in multiple
> languages in order to bootstrap their automated system.
> This is a great example of machines helping people help machines help
> people, etc... I'm sure this is now the most efficient way to produce high
> quality translations of Wikipedia articles en masse.
> We should take the ToS to make sure the translated text can be CC-BY-SA
> licensed.

OK, after a bit of drama in this discussion, i actually tried this toolkit.

First i tried to translate the Hebrew article [[שלום גד]] into English
(that's Shalom Gad, one of my favorite Israeli musicians). Apparently,
it can only translate from English. I am more interested in
translating Wikipedia articles from Hebrew into English, so it was
quite disappointing, but they'll probably fix it soon enough.

Then i tried to translate [[Art critic]] from English into Hebrew.
There were a few pleasant surprises, but on the whole the machine
translation was bad to the point of being unusable. It is much easier
to translate it using vi.

Google want side by side translations. It is not quite possible. A
grammar of a language is not just subjects, objects, tenses and
adjectives. Google seem to ignore [[Text linguistics]] - rules which
apply way beyond the word and the sentence. And these are *grammar
rules*, not just "style". (Disclaimer: The Department of Linguistics
in the Hebrew University of Jerusalem, where i study, is very keen on
this subject.)

I *had* to make very deep changes to paragraph structure - not to
mention sentence structure -, and not just because the Hebrew
Wikipedia has a different MOS, but because it's the basis of the
Hebrew language. A text without these changes would be next to
unreadable. I doubt that a document which is changed so deeply is very
useful to Google at this point. I certainly know that it is not useful
to me - i gave up after two paragraphs.

So yes, Google can revise the legalese of their TOS, but this is not a
very urgent problem. The uselessness of the technology makes the TOS
pretty irrelevant.

אמיר אלישע אהרוני
Amir Elisha Aharoni


"We're living in pieces,
 I want to live in peace." - T. Moore

More information about the foundation-l mailing list