CLDR and Names.php

List overview All Threads
Download

newer

older

Re: [Mediawiki-i18n] Invitation...

A localisation update and the CLDR

Shervin Afshar

13 Apr 2012 13 Apr '12

7:22 p.m.

Hi,

Two weeks ago Amir submitted a request to the mailing list asking folks to review the list of language names available in Names.php:

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=language...

The request was followed up by few issues noticed for Czech, Slovak, and some other languages. It's obvious that by relying only on available input from the community, one can not make sure that the rest of the data is correct. Given this I recommended doing a minimal implementation of CLDR data. Here's what I wrote to Amir:

I skimmed through the list and haven't seen anything incorrect. I have a

...

question though; considering the fact that some of this data is available in CLDR, have you ever considered integrating their data and then do a fallback? The fallback would definitely be necessary in some cases because your list is *way* more extensive than what CLDR currently supports.

Of course, CLDR specs lets adding new locales easily. So the ideal would be to have a seed (with minimal information) for the locales which doesn't exists there and are present in MW list. As CLDR is peer reviewed through surveys targeted in-country scholars and standard body representatives, normally the quality of the data and metadata is very good.

In the past, there was at least this one extension I know off which was facilitating the use of CLDR data on MW: http://www.mediawiki .org/wiki/Extension:CLDR

Let me know what you think. I'd be happy to help.

I haven't received any feedback from Amir up to now and as I'm not a MW developer, I'm writing here to ask for your opinion on the matter. The bottom line is that I can script out something that cross-checks Names.php values with CLDR entries, but I think it'd better to think about a long-term solution.

Cheers, Shervin

Attachments:

attachment.htm (text/html — 2.4 KB)

Show replies by date

Gerard Meijssen

14 Apr 14 Apr

9:21 a.m.

Hoi, We do use the CLDR data and we REALLY, REALLY want people to assess the data that is in the CLDR both for correctness and completeness. Many of the languages we support in MediaWIki are not yet supported in the CLDR. We are seeking this support actively. We would also be REALLY happy when any and all languages are supported in the CLDR.

And yes, the CLDR extension is still very much in use. Regularly we did override the CLDR data because of problems with its data.. Recently an override was put in place on the Incubator to change Aurocana to Mapundungun. Thanks, Gerard

PS you CAN change CLDR data at this moment ... http://cldr.unicode.org/index/survey-tool

On Fri, Apr 13, 2012 at 9:22 PM, Shervin Afshar shervinafshar@gmail.comwrote:

...

Hi,

Two weeks ago Amir submitted a request to the mailing list asking folks to review the list of language names available in Names.php:

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=language...

The request was followed up by few issues noticed for Czech, Slovak, and some other languages. It's obvious that by relying only on available input from the community, one can not make sure that the rest of the data is correct. Given this I recommended doing a minimal implementation of CLDR data. Here's what I wrote to Amir:

I skimmed through the list and haven't seen anything incorrect. I have a

...
question though; considering the fact that some of this data is available in CLDR, have you ever considered integrating their data and then do a fallback? The fallback would definitely be necessary in some cases because your list is *way* more extensive than what CLDR currently supports.

Of course, CLDR specs lets adding new locales easily. So the ideal would be to have a seed (with minimal information) for the locales which doesn't exists there and are present in MW list. As CLDR is peer reviewed through surveys targeted in-country scholars and standard body representatives, normally the quality of the data and metadata is very good.

In the past, there was at least this one extension I know off which was facilitating the use of CLDR data on MW: http://www.mediawiki .org/wiki/Extension:CLDR

Let me know what you think. I'd be happy to help.

I haven't received any feedback from Amir up to now and as I'm not a MW developer, I'm writing here to ask for your opinion on the matter. The bottom line is that I can script out something that cross-checks Names.php values with CLDR entries, but I think it'd better to think about a long-term solution.

Cheers, Shervin

Mediawiki-i18n mailing list Mediawiki-i18n@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

സുനിൽ (Sunil)

12:18 p.m.

Thanks Gerard

On Sat, Apr 14, 2012 at 2:51 PM, Gerard Meijssen gmeijssen@wikimedia.orgwrote:

...

Hoi, We do use the CLDR data and we REALLY, REALLY want people to assess the data that is in the CLDR both for correctness and completeness. Many of the languages we support in MediaWIki are not yet supported in the CLDR. We are seeking this support actively. We would also be REALLY happy when any and all languages are supported in the CLDR.

And yes, the CLDR extension is still very much in use. Regularly we did override the CLDR data because of problems with its data.. Recently an override was put in place on the Incubator to change Aurocana to Mapundungun. Thanks, Gerard

PS you CAN change CLDR data at this moment ... http://cldr.unicode.org/index/survey-tool

On Fri, Apr 13, 2012 at 9:22 PM, Shervin Afshar shervinafshar@gmail.comwrote:

...
Hi,

Two weeks ago Amir submitted a request to the mailing list asking folks to review the list of language names available in Names.php:

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=language...

The request was followed up by few issues noticed for Czech, Slovak, and some other languages. It's obvious that by relying only on available input from the community, one can not make sure that the rest of the data is correct. Given this I recommended doing a minimal implementation of CLDR data. Here's what I wrote to Amir:

I skimmed through the list and haven't seen anything incorrect. I have a

...
question though; considering the fact that some of this data is available in CLDR, have you ever considered integrating their data and then do a fallback? The fallback would definitely be necessary in some cases because your list is *way* more extensive than what CLDR currently supports.

Of course, CLDR specs lets adding new locales easily. So the ideal would be to have a seed (with minimal information) for the locales which doesn't exists there and are present in MW list. As CLDR is peer reviewed through surveys targeted in-country scholars and standard body representatives, normally the quality of the data and metadata is very good.

In the past, there was at least this one extension I know off which was facilitating the use of CLDR data on MW: http://www.mediawiki .org/wiki/Extension:CLDR

Let me know what you think. I'd be happy to help.

I haven't received any feedback from Amir up to now and as I'm not a MW developer, I'm writing here to ask for your opinion on the matter. The bottom line is that I can script out something that cross-checks Names.php values with CLDR entries, but I think it'd better to think about a long-term solution.

Cheers, Shervin

Mediawiki-i18n mailing list Mediawiki-i18n@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

Mediawiki-i18n mailing list Mediawiki-i18n@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

Xuacu Saturio

15 Apr 15 Apr

12:10 a.m.

Hi Gerard, all

Some time ago I was trying to get Asturian language included in CLDR [1].

Unfortunately, it took more than expected to solve a bug with our alphabet [2], and we didn't make in time for r2.0

Perhaps now is the right time to try again, but I'm not a developer and it's **really** hard for me to create the requested data. I guess I can get the collation chart from [3], but... anything else? Do you know a step by step guide for absolute beginners on how to provide the required data? I find Survey Tool (and most of the tools in CLDR) more complicated than it could be for non-technical people.

Thanks in advance for any clue you can give me. Best regards.

[1] http://unicode.org/cldr/trac/ticket/2868 [2] https://bugs.freedesktop.org/show_bug.cgi?id=32965 [3] http://developer.mimer.com/charts/asturian.htm

-- Xuacu Saturio Sent while testing Thunderbird Unviáu dende Thunderbird en pruebes

Gerard Meijssen

7:38 a.m.

Hoi, I blogged about this .. http://ultimategerardm.blogspot.com/2012/04/supporting-asturian-in-cldr.html Thanks, Gerard

On 15 April 2012 02:10, Xuacu Saturio xuacusk8@gmail.com wrote:

...

Hi Gerard, all

Some time ago I was trying to get Asturian language included in CLDR [1].

Unfortunately, it took more than expected to solve a bug with our alphabet [2], and we didn't make in time for r2.0

Perhaps now is the right time to try again, but I'm not a developer and it's **really** hard for me to create the requested data. I guess I can get the collation chart from [3], but... anything else? Do you know a step by step guide for absolute beginners on how to provide the required data? I find Survey Tool (and most of the tools in CLDR) more complicated than it could be for non-technical people.

Thanks in advance for any clue you can give me. Best regards.

[1] http://unicode.org/cldr/trac/**ticket/2868 http://unicode.org/cldr/trac/ticket/2868 [2] https://bugs.freedesktop.org/**show_bug.cgi?id=32965 https://bugs.freedesktop.org/show_bug.cgi?id=32965 [3] http://developer.mimer.com/**charts/asturian.htm http://developer.mimer.com/charts/asturian.htm

-- Xuacu Saturio

Sent while testing Thunderbird Unviáu dende Thunderbird en pruebes

______________________________**_________________ Mediawiki-i18n mailing list Mediawiki-i18n@lists.**wikimedia.org Mediawiki-i18n@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/mediawiki-**i18n https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

സുനിൽ (Sunil)

7:47 a.m.

I have requested for an account there. What is the processing delay there?

On Sun, Apr 15, 2012 at 1:08 PM, Gerard Meijssen gerard.meijssen@gmail.comwrote:

...

Hoi, I blogged about this ..

http://ultimategerardm.blogspot.com/2012/04/supporting-asturian-in-cldr.html Thanks, Gerard

On 15 April 2012 02:10, Xuacu Saturio xuacusk8@gmail.com wrote:

...
Hi Gerard, all

Some time ago I was trying to get Asturian language included in CLDR [1].

Unfortunately, it took more than expected to solve a bug with our alphabet [2], and we didn't make in time for r2.0

Perhaps now is the right time to try again, but I'm not a developer and it's **really** hard for me to create the requested data. I guess I can get the collation chart from [3], but... anything else? Do you know a step by step guide for absolute beginners on how to provide the required data? I find Survey Tool (and most of the tools in CLDR) more complicated than it could be for non-technical people.

Thanks in advance for any clue you can give me. Best regards.

[1] http://unicode.org/cldr/trac/**ticket/2868 http://unicode.org/cldr/trac/ticket/2868 [2] https://bugs.freedesktop.org/**show_bug.cgi?id=32965 https://bugs.freedesktop.org/show_bug.cgi?id=32965 [3] http://developer.mimer.com/**charts/asturian.htm http://developer.mimer.com/charts/asturian.htm

-- Xuacu Saturio

Sent while testing Thunderbird Unviáu dende Thunderbird en pruebes

______________________________**_________________ Mediawiki-i18n mailing list Mediawiki-i18n@lists.**wikimedia.org Mediawiki-i18n@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/mediawiki-**i18n https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

Mediawiki-i18n mailing list Mediawiki-i18n@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

Xuacu Saturio

8:57 p.m.

Nice!

You describe quite accurately how daunting this process may be for any person trying to include a new language.

Moreover, it happens that the first attempts to include a new language are made by persons like me (translators, writers...), without a deep knowledge of what's going on in the technical side of Unicode standards and have to learn it the hard way. Oddly enough, things become easier once you pass the entry point and you can use the Survey Tool :)

Best regards.

El 15/04/12 09:38, Gerard Meijssen escribió:

...

Hoi, I blogged about this .. http://ultimategerardm.blogspot.com/2012/04/supporting-asturian-in-cldr.html Thanks, Gerard

On 15 April 2012 02:10, Xuacu Saturio <xuacusk8@gmail.com mailto:xuacusk8@gmail.com> wrote:

Hi Gerard, all

Some time ago I was trying to get Asturian language included in
CLDR [1].

Unfortunately, it took more than expected to solve a bug with our
alphabet [2], and we didn't make in time for r2.0

Perhaps now is the right time to try again, but I'm not a
developer and it's **really** hard for me to create the requested
data. I guess I can get the collation chart from [3], but...
anything else? Do you know a step by step guide for absolute
beginners on how to provide the required data? I find Survey Tool
(and most of the tools in CLDR) more complicated than it could be
for non-technical people.

Thanks in advance for any clue you can give me. Best regards.

[1] http://unicode.org/cldr/trac/ticket/2868
[2] https://bugs.freedesktop.org/show_bug.cgi?id=32965
[3] http://developer.mimer.com/charts/asturian.htm

-- 
Xuacu Saturio

Sent while testing Thunderbird
Unviáu dende Thunderbird en pruebes



_______________________________________________
Mediawiki-i18n mailing list
Mediawiki-i18n@lists.wikimedia.org
<mailto:Mediawiki-i18n@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

Mediawiki-i18n mailing list Mediawiki-i18n@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

-- Xuacu Saturio Sent while testing Thunderbird Unviáu dende Thunderbird en pruebes

Robin Pepermans

14 Apr 14 Apr

11:12 p.m.

Hi,

Some time ago I made a comparison of MediaWiki and CLDR names, see http://translatewiki.net/wiki/User:SPQRobin/languages . I find it also important to have the right language names, and to be consistent/efficient. Something I've been thinking about for some time is to change names in MediaWiki from always uppercase to lowercase where that's normal (as in CLDR).

Also, I am planning to include all English language names (from ISO 639-3) in MediaWiki. Including native names from CLDR in core is more difficult, but we could perhaps do that in the future and then fallback where possible, as you propose.

2012/4/13 Shervin Afshar shervinafshar@gmail.com:

...

Hi,

Two weeks ago Amir submitted a request to the mailing list asking folks to review the list of language names available in Names.php:

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=language...

The request was followed up by few issues noticed for Czech, Slovak, and some other languages. It's obvious that by relying only on available input from the community, one can not make sure that the rest of the data is correct. Given this I recommended doing a minimal implementation of CLDR data. Here's what I wrote to Amir:

...
I skimmed through the list and haven't seen anything incorrect. I have a question though; considering the fact that some of this data is available in CLDR, have you ever considered integrating their data and then do a fallback? The fallback would definitely be necessary in some cases because your list is *way* more extensive than what CLDR currently supports.

Of course, CLDR specs lets adding new locales easily. So the ideal would be to have a seed (with minimal information) for the locales which doesn't exists there and are present in MW list. As CLDR is peer reviewed through surveys targeted in-country scholars and standard body representatives, normally the quality of the data and metadata is very good.

In the past, there was at least this one extension I know off which was facilitating the use of CLDR data on MW: http://www.mediawiki.org/wiki/Extension:CLDR

Let me know what you think. I'd be happy to help.

I haven't received any feedback from Amir up to now and as I'm not a MW developer, I'm writing here to ask for your opinion on the matter. The bottom line is that I can script out something that cross-checks Names.php values with CLDR entries, but I think it'd better to think about a long-term solution.

Cheers, Shervin

Mediawiki-i18n mailing list Mediawiki-i18n@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

4616

Age (days ago)

4618

Last active (days ago)

mediawiki-i18n@lists.wikimedia.org

7 comments

6 participants

tags (0)

participants (6)

Gerard Meijssen
Gerard Meijssen
Robin Pepermans
Shervin Afshar
Xuacu Saturio
സുനിൽ (Sunil)