Hi,
after a user reported a discrepancy between two versions of the autonym for the Czech language used in different contexts in MediaWiki, I noticed it comes from a difference between the MediaWiki core Names.php and CLDR data (via the CLDR extension’s CldrNames). Originally, I just intended to submit a patch to Names.php unifying those two to the CLDR version, but then I thought the problem might exist in other languages, too.
So, I compiled a list [1][2] of 39 differences in language names between MW core and the CLDR extension. Some of those differences are just case changes, some of them might be errors in one of the sources, some of them just random choices between two equally valid versions (like in the Czech case, it’s not like one of them is generally better than the other, it’s just that consistency would be better).
I definitely do not intend to change all those languages I do not understand, but maybe other people could be interested in checking their language in the list…
-- [[cs:User:Mormegil | Petr Kadlec]]
[1] http://translatewiki.net/wiki/User:Mormegil/CLDR_language_names_differences [2] https://gist.github.com/mormegil-cz/7240262
Petr Kadlec, 30/10/2013 22:18:
So, I compiled a list [1][2] of 39 differences in language names between MW core and the CLDR extension. Some of those differences are just case changes, some of them might be errors in one of the sources, some of them just random choices between two equally valid versions (like in the Czech case, it’s not like one of them is generally better than the other, it’s just that consistency would be better).
And some are just locales which don't exist in CLDR yet. Thanks for this list! Just to confirm, those you took from CLDR are all language autonyms, i.e. the language name in that language?
Nemo
On Thu, Oct 31, 2013 at 7:19 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Petr Kadlec, 30/10/2013 22:18:
So, I compiled a list [1][2] of 39 differences in language names between
MW core and the CLDR extension. Some of those differences are just case changes, some of them might be errors in one of the sources, some of them just random choices between two equally valid versions (like in the Czech case, it’s not like one of them is generally better than the other, it’s just that consistency would be better).
And some are just locales which don't exist in CLDR yet. Thanks for this list! Just to confirm, those you took from CLDR are all language autonyms, i.e. the language name in that language?
Yes, I compared the name of the language in Names.php with the name of the language in _the corresponding_ CldrNamesXx.php from the CLDR extension, i.e. note I did not use CLDR data directly, only the CLDR extension. (The script which created the list can be seen on github, a link is in the original post.)
-- [[cs:User:Mormegil | Petr Kadlec]]
Petr Kadlec, 31/10/2013 10:09:
Yes, I compared the name of the language in Names.php with the name of the language in _the corresponding_ CldrNamesXx.php from the CLDR extension, i.e. note I did not use CLDR data directly, only the CLDR extension. (The script which created the list can be seen on github, a link is in the original post.)
Heh, but how lazy we list idlers are, nobody else helped edit. So, to break down the todos, can you do the following: 1) torture Nikerabbit and purodha until they emit a conclusion on Ripoarisch vs. Kölsch and sámegiella vs. davvisámegiella; 2) submit a patch to drop the MediaWiki names which differ from CLDR only in casing (AFAICS: Esperanto, Ирон, shqip, SiSwati, Кыргызча); 3) file a bug in MediaWiki for the locale(s) which disagree with CLDR on main script, i.e. shi Tašlḥiyt/ⵜⴰⵛⵍⵃⵉⵜ vs. ⵜⴰⵎⴰⵣⵉⵖⵜ (we already have shi-latn); 4) where names differ "only" in diacritics, file a bug against the project that has less I guess :), i.e. Qafar, Gikuyu; 5) where either has less specifications (probably different cases or "language X" vs. "X" differences), no idea; 6) ignore the stylistical differences (the two zh-* variants); 7) ignore the variants undefined in CLDR (de-ch, en-gb, kk-cyrl, shi-latn, sr-ec, tg-cyrl); 7) file a bug in MediaWiki for all the others?
Nemo
Petr Kadlec, 30/10/2013 22:18:>
I definitely do not intend to change all those languages I do not understand, but maybe other people could be interested in checking their language in the list…
-- [[cs:User:Mormegil | Petr Kadlec]]
[1]
http://translatewiki.net/wiki/User:Mormegil/CLDR_language_names_differences
"Petr Kadlec" petr.kadlec@gmail.com writes:
Petr Kadlec, 31/10/2013 10:09:
Yes, I compared the name of the language in Names.php with the name of the language in _the corresponding_ CldrNamesXx.php from the CLDR extension, i.e. note I did not use CLDR data directly, only the CLDR extension. (The script which created the list can be seen on github, a link is in the original post.)
Heh, but how lazy we list idlers are, nobody else helped edit. So, to break down the todos, can you do the following:
- torture Nikerabbit and purodha until they emit a conclusion on
Ripoarisch vs. Kölsch ...
Ripuarian is a scientific collective term for a language family that Colognian (Kölsch) is part of. So the correct names for ksh are "Colognian" in English, and "Kölsch" as the autonym. The Problem is that there is no ISO 639 code for Ripuarian, but the WMF projects use ksh as a replacement until one has been assigned. The correct English name for Ripuarian is "Ripuarian", or "Ripuarian Franconian". An autonym does not really exist since people mostly are not aware of the linguistic family relations, but a common denominator would be "Platt" or "plat" which is similar to "vernacular", and identical among a huge variety of Dutch, Belgian, and West German local langages, including Colognian and almost all other Ripuarian varieties.
... and sámegiella vs. davvisámegiella; 2) submit a patch to drop the MediaWiki names which differ from CLDR only in casing (AFAICS: Esperanto, Ирон, shqip, SiSwati, Кыргызча);
Agreed that we can do that, but I would recommend to warn communities about it, and respect their feedbacks, possibly ignite them to have CLDR updated.
- file a bug in MediaWiki for the locale(s) which disagree with CLDR on
main script, i.e. shi Tašlḥiyt/ⵜⴰⵛⵍⵃⵉⵜ vs. ⵜⴰⵎⴰⵣⵉⵖⵜ (we already have shi-latn); 4) where names differ "only" in diacritics, file a bug against the project that has less I guess :), i.e. Qafar, Gikuyu; 5) where either has less specifications (probably different cases or "language X" vs. "X" differences), no idea;
Ask communitites? Possibly recommend to update CLDR once the existing names.php contents are confirmed?
Purodha
mediawiki-i18n@lists.wikimedia.org