[Foundation-l] Following the conventions: seperating Wikisource
gerard.meijssen at gmail.com
Wed Jun 6 19:38:31 UTC 2007
I am well aware of where Google stands on supporting languages. I have
discussed this for two years now with one of their language engeneers. You
underestimate the importance that the proper language codes should have. You
are not aware on the importance that is given to project of the Wikimedia
Foundation. It is exactly because we aim to do justice and promote language
diversity that we invest in Multilingual MediaWiki. And it is with a lot of
frustration that for all kinds of reasons, good and bad, it is still not
If Google and Internet is only about being able to find things on the
Internet, than only languages with a more or less fixed orthography will be
found. Most content of other languages can only be found like a needle in
the proverbial haystack. This problem is made worse because of people that
mean well but have no clue about the complexity of the problem.
Indicating what language a text is in, is vitaly important. It is
particularly important for those language that do not have much of a foot
print on the Internet.
On 6/6/07, Yann Forget <yann at forget-me.net> wrote:
> GerardM a écrit :
> > Hoi,
> > When you look at the details for the HTML it will tell you that the
> > is English. It is obviously not. Technically all content in
> > Wikisource.orgthat is not English should be marked for the language
> > that it is.
> > Having content marked English while it is in actual fact not English
> > that the meta-data of the page is wrong. Having multiple languages
> > the same MediaWiki database is technically a disaster. It is not marked
> > any way what language it is. This is in and of itself bad.
> Ok, I see what you mean, what you exaggerate greatly the importance of
> that, especially because Google doesn't know any Indian language, nor
> any dead language. It doesn't do any difference between old Greek and
> modern Greek, doesn't know Belarussian nor Kazakh (close to Russian),
> doesn't know Urdu not Kurdi, etc.
> The only really useful cases are when the same word exists in different
> languages. Most of these cases are for languages separate in subdomains.
> For the other possibilities, there is little risk to confuse Sanskrit
> and Armenian, for example.
> For the rest, searching for example for ईशावास्य उपनिषद् works fairly
> well, it even gives Wikisource as the first answer. ;o)
> PS: As JHS, told me, need to add — lang="sa" xml:lang="sa" — to each
> page. That could easily be done with a bot.
> > Thanks,
> > GerardM
> > On 6/6/07, Yann Forget <yann at forget-me.net> wrote:
> >> Hello,
> >> GerardM a écrit :
> >>> Hoi,
> >>> It is exactly to find out if it is an "otherwise accepted language"
> >> the
> >>> language committee wants to make sure that the content is coded in
> >>> way.. I would not be surprised when all the content in
> >>> is NOT English is not coded correctly in the first place.
> >>> Thanks,
> >>> GerardM
> >> I don't understand what you want to do here.
> >> Which code are you talking about?
> >> What can you do about the coding of this?
> >> or this?
> >> Regards,
> >> Yann
> http://www.non-violence.org/ | Site collaboratif sur la non-violence
> http://www.forget-me.net/ | Alternatives sur le Net
> http://fr.wikipedia.org/ | Encyclopédie libre
> http://fr.wikisource.org/ | Bibliothèque libre
> http://wikilivres.info | Documents libres
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
More information about the foundation-l