[Sorry, I posted the wrong links for the first two searches in my previous message.]
Because on many occasions Google does find the page in question in a certain other language.
According to Google, the Hungarian Wikipedia has 3870 articles written "in English": http://www.google.com/search?lr=lang_en&as_sitesearch=hu.wikipedia.org
Over 52,000 articles "in Czech": http://www.google.com/search?lr=lang_cs&as_sitesearch=hu.wikipedia.org
Two articles which Google thinks are in Chinese (simplified): http://www.google.com/search?lr=lang_zh-CN&as_sitesearch=hu.wikipedia.or...
The Hungarian article about the National anthem of Russia is supposedly in traditional Chinese: http://www.google.com/search?lr=lang_zh-CN&as_sitesearch=hu.wikipedia.or...
And so on.
Regards, Endre (KovacsUr@huwiki)
----- Original Message ----- From: "Angela" beesley@gmail.com To: "Wikimedia developers" wikitech-l@wikimedia.org Sent: Friday, September 02, 2005 10:22 PM Subject: Re: [Wikitech-l] Assisting Google's language recognition?
Google miscategorizes the language of some of the Hungarian Wikipedia pages. E.g. it thinks that our Adolf Hitler article is in czech.
How do you know they are miscategorising the language?
<http://www.google.com/search?q=inurl%3A%22Adolf+Hitler%22+site%3Ahu.wikiped
ia.org>
This makes it seem like they haven't indexed the page at all, not that they've marked it as the wrong language.
Angela.
wikitech-l@lists.wikimedia.org