[Foundation-l] Google Translate now assists with humantranslations of Wikipedia articles

Gerard Meijssen gerard.meijssen at gmail.com
Mon Jun 15 11:53:34 UTC 2009

One of the most important things that is needed for adding languages to a
technology like this is having a sufficiently sized corpus. For general
availability, the expectation for the quality is quite high. To me this
seems to be one reason why Google did not add more languages. Another reason
why many corpora are not big enough is because of the problem of identifying
a text for the language it is written in. When you consider that a few years
ago I learned that only a small percentage of Internet content has the
metadata for the language that is used.. When you then consider that
something like 75% is actually wrong...

Given that Google actually supports MediaWiki, it may be that they are
willing to support our language. The problem however is that many of our
language have illegal and even wrong codes. The consequence is that it is
not obvious to just support our "language". This issue will not be resolved
because people are under the impression that the "community" has the final
word about the names of our languages. This is naive as well as problematic
because it prevents the ease of the argument for Google to support our

2009/6/15 Marcus Buck <me at marcusbuck.org>

> Gerard Meijssen hett schreven:
> > Hoi,
> > The quality of the translations will vary. There are many reasons for it
> and
> > one of the things that will make a difference is the number of people
> using
> > the translate tool as a rough first pass. Once this is done, using the
> > translation functionality will help Google to improve the quality of the
> > code.
> >
> > This has been said before, there is no news here. What is relevant
> however
> > is that in order to support the languages that have not been supported so
> > far, there is a need for people actually using this tool to build the
> > translation corpus that gets you this first pass functionality.
> >
> > Translation is not something where a silver bullet will provide an
> "instant
> > on - high quality" experience and it is the languages that are currently
> not
> > supported that have the highest need for tools like this.
> This is interesting. I did not know it's possible to train new
> languages. Is there any available information on the requirements? What
> requirements need to be met, to make Google support them (so they can be
> selected in the drop-down at the translator toolkit)? _How much_ text do
> they need as a basis to finally enable the translation function?
> (My personal experience with the collaboratetiveness of Google is a bad
> one. Although Google is a multi-billion dollar company and [in a fair
> world] should actually _pay_ people for things like translating their
> interface in as much languages as possible [as Google with its 80%
> search engine market share is one of the most important internet access
> vectors and not having a search engine in your language is a big
> accessibility barrier] they rather choose to go the cheap way and let
> volunteers translate it. That not enough, they have the chutzpa to
> _reject_ adding any further languages [no additions since at least 2007,
> although they still support Elmer Fudd, bork bork bork, Klingon and
> pirate speak...]. At the moment Google supports the languages of
> roundabout 85 to 90% of the world's population and it seems, they don't
> care about the rest.)
> Marcus Buck
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

More information about the foundation-l mailing list