Hoi, I am well aware of where Google stands on supporting languages. I have discussed this for two years now with one of their language engeneers. You underestimate the importance that the proper language codes should have. You are not aware on the importance that is given to project of the Wikimedia Foundation. It is exactly because we aim to do justice and promote language diversity that we invest in Multilingual MediaWiki. And it is with a lot of frustration that for all kinds of reasons, good and bad, it is still not finished.
If Google and Internet is only about being able to find things on the Internet, than only languages with a more or less fixed orthography will be found. Most content of other languages can only be found like a needle in the proverbial haystack. This problem is made worse because of people that mean well but have no clue about the complexity of the problem.
Indicating what language a text is in, is vitaly important. It is particularly important for those language that do not have much of a foot print on the Internet.
Thanks, GerardM
On 6/6/07, Yann Forget yann@forget-me.net wrote:
GerardM a écrit :
Hoi, When you look at the details for the HTML it will tell you that the
language
is English. It is obviously not. Technically all content in Wikisource.orgthat is not English should be marked for the language that it is.
Having content marked English while it is in actual fact not English
means
that the meta-data of the page is wrong. Having multiple languages
within
the same MediaWiki database is technically a disaster. It is not marked
in
any way what language it is. This is in and of itself bad.
Ok, I see what you mean, what you exaggerate greatly the importance of that, especially because Google doesn't know any Indian language, nor any dead language. It doesn't do any difference between old Greek and modern Greek, doesn't know Belarussian nor Kazakh (close to Russian), doesn't know Urdu not Kurdi, etc.
The only really useful cases are when the same word exists in different languages. Most of these cases are for languages separate in subdomains. For the other possibilities, there is little risk to confuse Sanskrit and Armenian, for example.
For the rest, searching for example for ईशावास्य उपनिषद् works fairly well, it even gives Wikisource as the first answer. ;o)
Regards,
Yann
PS: As JHS, told me, need to add — lang="sa" xml:lang="sa" — to each page. That could easily be done with a bot.
Thanks, GerardM
On 6/6/07, Yann Forget yann@forget-me.net wrote:
Hello,
GerardM a écrit :
Hoi, It is exactly to find out if it is an "otherwise accepted language"
that
the
language committee wants to make sure that the content is coded in
this
way.. I would not be surprised when all the content in
wikisource.orgthat
is NOT English is not coded correctly in the first place. Thanks, GerardM
I don't understand what you want to do here. Which code are you talking about?
What can you do about the coding of this?
http://wikisource.org/wiki/%E0%A4%88%E0%A4%B6%E0%A4%BE%E0%A4%B5%E0%A4%BE%E0%...
or this?
http://wikisource.org/wiki/%D4%B1%D4%BC%D4%BC%D4%B1%D5%80%D4%BB%D5%91_%D5%82...
Regards,
Yann
-- http://www.non-violence.org/ | Site collaboratif sur la non-violence http://www.forget-me.net/ | Alternatives sur le Net http://fr.wikipedia.org/ | Encyclopédie libre http://fr.wikisource.org/ | Bibliothèque libre http://wikilivres.info | Documents libres
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l