Hi,

Two weeks ago Amir submitted a request to the mailing list asking folks to review the list of language names available in Names.php:

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=languages/Names.php;hb=HEAD

The request was followed up by few issues noticed for Czech, Slovak, and some other languages. It's obvious that by relying only on available input from the community, one can not make sure that the rest of the data is correct. Given this I recommended doing a minimal implementation of CLDR data. Here's what I wrote to Amir:

I skimmed through the list and haven't seen anything incorrect. I have a question though; considering the fact that some of this data is available in CLDR, have you ever considered integrating their data and then do a fallback? The fallback would definitely be necessary in some cases because your list is *way* more extensive than what CLDR currently supports.

Of course, CLDR specs lets adding new locales easily. So the ideal would be to have a seed (with minimal information) for the locales which doesn't exists there and are present in MW list. As CLDR is peer reviewed through surveys targeted in-country scholars and standard body representatives, normally the quality of the data and metadata is very good.

In the past, there was at least this one extension I know off which was facilitating the use of CLDR data on MW: http://www.mediawiki.org/wiki/Extension:CLDR

Let me know what you think. I'd be happy to help.

I haven't received any feedback from Amir up to now and as I'm not a MW developer, I'm writing here to ask for your opinion on the matter. The bottom line is that I can script out something that cross-checks Names.php values with CLDR entries, but I think it'd better to think about a long-term solution.

Cheers,
Shervin