[Foundation-l] Google Translate now assists with humantranslations of Wikipedia articles
Marcus Buck
me at marcusbuck.org
Mon Jun 15 13:15:04 UTC 2009
Gerard Meijssen hett schreven:
> Hoi,
> The proper use of language codes is indeed a recurring theme. Calling it a
> hobby horse gives the impression that it does not have a real world
> application. It does have a real world application and one of the problems
> with language is that it is truly hard to recognise languages confidently.
> Suggesting that Google can because of its size is too easy. I am sure they
> would have if they could.
> Thanks,
> GerardM
>
Let's assume Google wants to build an Alemannic translation tool. They
are searching for an Alemannic text corpus. Will they fail to find the
Alemannic Wikipedia cause 'als' stands for a form of Albanian? I don't
think so.
Don't understand me wrong, I am _pro_ the use of correct codes and I
would reject the opinion, that projects have the right to decide to
stick to a wrong code. But I also reject to switch projects to codes
that don't match the project ('gsw' for example is no proper substitute
for 'als') and I reject code switches that do harm to the projects (that
means that the old code has to be a redirect to the new code at least
for several years).
And most importantly I think, that the question of ISO codes is not
related to Google's operations. If Google wants to use Wikipedia content
to improve their tools it should be really easy for them to do the code
mapping (e.g. 'no'->'nb').
So does anybody know how big a corpus must be to be helpful to Google?
Marcus Buck
More information about the foundation-l
mailing list