MediaWiki is upgrading its plural rules to match CLDR version 26. The updates include incompatible changes for plural forms in Russian, Prussian, Tagalog, Manx and several languages that fall back to Russian. In addition there are minor changes for other languages.
In January 2014, CLDR 24 had introduced several changes in the plural forms for some of these languages, including Russian, and we had updated MediaWiki's plural rules to comply with the CLDR standard. Some of these changes are now being reverted. Below is a detailed explanation of the changes.
For the migration period, from Monday, 27th October 2014 to Thursday 6th November 2014, we have disabled LocalisationUpdate at Wikimedia wikis to reduce the chance of ungrammatical translations being displayed in the interface. Translators are requested to start updating translations from Tuesday 28th October 2014 onward.
Further updates will be posted on https://translatewiki.net/wiki/Thread:Support/Plural_rule_changes_for_many_l...
== Russian and language using Russian as fallback == Languages affected: Russian (ru), Abkhaz (ab), Avaric (av), Bashkir (ba), Buryat (bxr), Chechen (ce), Crimean Tatar (crh-cyrl), Chuvash (cv), Inguish (inh), Komi-Permyak (koi), Karachay-Balkar (krc), Komi (kv), Lak (lbe), Lezghian (lez), Eastern Mari (mhr), Western Mari (mrj), Yakut (sah), Tatar (tt), Tatar-Cyrillic (tt-cyrl), Tuvinian (tyv), Udmurt (udm), Kalmyk (xal).
CLDR 24 plural forms for Russian were: * Form 1: @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, … * Form 2: @integer 0, 5~19, 100, 1000, 10000, 100000, 1000000, … * Form 3: @integer 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102, 1002, … @decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …
This has been changed to : * Form 1: @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, … * Form 2: @integer 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102, 1002, … * Form 3: @integer 0, 5~19, 100, 1000, 10000, 100000, 1000000, … * Form 4: @decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …
Plurals in translations for affected languages have been updated automatically where possible. Translators are requested to check all messages containing plurals, starting from those which have been marked as outdated.
== Prussian == Prussian (prg) now follows the same rules as Latvian (lv): * Form 1: @integer 0, 10~20, 30, 40, 50, 60, 100, 1000, 10000, 100000, 1000000, … @decimal 0.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, … * Form 2: @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, … @decimal 0.1, 1.0, 1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 10.1, 100.1, 1000.1, … * Form 3: @integer 2~9, 22~29, 102, 1002, … @decimal 0.2~0.9, 1.2~1.9, 10.2, 100.2, 1000.2, …
Translators are requested to update all translations containing plural rules. Those translations have been marked as outdated.
== Tagalog == Tagalog (tl) has new rules as follows: * Form 1: @integer 0~3, 5, 7, 8, 10~13, 15, 17, 18, 20, 21, 100, 1000, 10000, 100000, 1000000, … @decimal 0.0~0.3, 0.5, 0.7, 0.8, 1.0~1.3, 1.5, 1.7, 1.8, 2.0, 2.1, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, … * Form 2: @integer 4, 6, 9, 14, 16, 19, 24, 26, 104, 1004, … @decimal 0.4, 0.6, 0.9, 1.4, 1.6, 1.9, 2.4, 2.6, 10.4, 100.4, 1000.4, …
Translators are requested to update all translations containing plural rules. Those translations have been marked as outdated.
== Manx == Manx (gv) has a new (fourth) form for decimals. New rules are as follows: * Form 1: @integer 1, 11, 21, 31, 41, 51, 61, 71, 101, 1001, … * Form 2: @integer 2, 12, 22, 32, 42, 52, 62, 72, 102, 1002, … * Form 3: @integer 0, 20, 40, 60, 80, 100, 120, 140, 1000, 10000, 100000, 1000000, … * Form 4: @decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, … * Form 5: @integer 3~10, 13~19, 23, 103, 1003, …
Translators are requested to update all translations containing plural rules. Those translations have been marked as outdated.
== Other languages == * In Mirandese (mwl), Portuguese (pt) and Brazilian Portuguese (pt-br), the first form now also includes zero. * In Uyghur (ug), Lower Sorbian (dsb) and Upper Sorbian (hsb), support for decimals was added. * In Asturian (ast) and Western Frisian (fy), the first form is no longer used for decimals.
Translators are encouraged to review translations with plural forms and update them where necessary. Because the changes have been minor, we have not marked those translations as outdated.
-Niklas
Regarding Russian; what I see is that there's a separation of decimals which should be minor (like Uyghur or Asturian), but I don't understand why there was the need to swap the 2nd and 3rd form and why an automated bot could not perform this swap automatically in ressources, and could not be using the existing 3rd form (now 2nd) be used to create a new default value (to review) for the 4th form for numbers with decimals.
2014-10-27 19:44 GMT+01:00 Niklas Laxström niklas.laxstrom@gmail.com:
MediaWiki is upgrading its plural rules to match CLDR version 26. The updates include incompatible changes for plural forms in Russian, Prussian, Tagalog, Manx and several languages that fall back to Russian. In addition there are minor changes for other languages.
In January 2014, CLDR 24 had introduced several changes in the plural forms for some of these languages, including Russian, and we had updated MediaWiki's plural rules to comply with the CLDR standard. Some of these changes are now being reverted. Below is a detailed explanation of the changes.
For the migration period, from Monday, 27th October 2014 to Thursday 6th November 2014, we have disabled LocalisationUpdate at Wikimedia wikis to reduce the chance of ungrammatical translations being displayed in the interface. Translators are requested to start updating translations from Tuesday 28th October 2014 onward.
Further updates will be posted on
https://translatewiki.net/wiki/Thread:Support/Plural_rule_changes_for_many_l...
== Russian and language using Russian as fallback == Languages affected: Russian (ru), Abkhaz (ab), Avaric (av), Bashkir (ba), Buryat (bxr), Chechen (ce), Crimean Tatar (crh-cyrl), Chuvash (cv), Inguish (inh), Komi-Permyak (koi), Karachay-Balkar (krc), Komi (kv), Lak (lbe), Lezghian (lez), Eastern Mari (mhr), Western Mari (mrj), Yakut (sah), Tatar (tt), Tatar-Cyrillic (tt-cyrl), Tuvinian (tyv), Udmurt (udm), Kalmyk (xal).
CLDR 24 plural forms for Russian were:
- Form 1: @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, …
- Form 2: @integer 0, 5~19, 100, 1000, 10000, 100000, 1000000, …
- Form 3: @integer 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102, 1002, …
@decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …
This has been changed to :
- Form 1: @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, …
- Form 2: @integer 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102, 1002, …
- Form 3: @integer 0, 5~19, 100, 1000, 10000, 100000, 1000000, …
- Form 4: @decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0,
1000000.0, …
Plurals in translations for affected languages have been updated automatically where possible. Translators are requested to check all messages containing plurals, starting from those which have been marked as outdated.
== Prussian == Prussian (prg) now follows the same rules as Latvian (lv):
- Form 1: @integer 0, 10~20, 30, 40, 50, 60, 100, 1000, 10000, 100000,
1000000, … @decimal 0.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …
- Form 2: @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, …
@decimal 0.1, 1.0, 1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 10.1, 100.1, 1000.1, …
- Form 3: @integer 2~9, 22~29, 102, 1002, … @decimal 0.2~0.9, 1.2~1.9,
10.2, 100.2, 1000.2, …
Translators are requested to update all translations containing plural rules. Those translations have been marked as outdated.
== Tagalog == Tagalog (tl) has new rules as follows:
- Form 1: @integer 0~3, 5, 7, 8, 10~13, 15, 17, 18, 20, 21, 100, 1000,
10000, 100000, 1000000, … @decimal 0.0~0.3, 0.5, 0.7, 0.8, 1.0~1.3, 1.5, 1.7, 1.8, 2.0, 2.1, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …
- Form 2: @integer 4, 6, 9, 14, 16, 19, 24, 26, 104, 1004, … @decimal
0.4, 0.6, 0.9, 1.4, 1.6, 1.9, 2.4, 2.6, 10.4, 100.4, 1000.4, …
Translators are requested to update all translations containing plural rules. Those translations have been marked as outdated.
== Manx == Manx (gv) has a new (fourth) form for decimals. New rules are as follows:
- Form 1: @integer 1, 11, 21, 31, 41, 51, 61, 71, 101, 1001, …
- Form 2: @integer 2, 12, 22, 32, 42, 52, 62, 72, 102, 1002, …
- Form 3: @integer 0, 20, 40, 60, 80, 100, 120, 140, 1000, 10000,
100000, 1000000, …
- Form 4: @decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0,
1000000.0, …
- Form 5: @integer 3~10, 13~19, 23, 103, 1003, …
Translators are requested to update all translations containing plural rules. Those translations have been marked as outdated.
== Other languages ==
- In Mirandese (mwl), Portuguese (pt) and Brazilian Portuguese
(pt-br), the first form now also includes zero.
- In Uyghur (ug), Lower Sorbian (dsb) and Upper Sorbian (hsb), support
for decimals was added.
- In Asturian (ast) and Western Frisian (fy), the first form is no
longer used for decimals.
Translators are encouraged to review translations with plural forms and update them where necessary. Because the changes have been minor, we have not marked those translations as outdated.
-Niklas
Translators-l mailing list Translators-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/translators-l
2014-10-29 10:37 GMT+02:00 Philippe Verdy verdy_p@wanadoo.fr:
Regarding Russian; what I see is that there's a separation of decimals which should be minor (like Uyghur or Asturian), but I don't understand why there was the need to swap the 2nd and 3rd form and why an automated bot could not perform this swap automatically in ressources, and could not be using the existing 3rd form (now 2nd) be used to create a new default value (to review) for the 4th form for numbers with decimals.
In case I was not clear, we ran a script to swap third and second form for Russian and other languages which had the same change. Since decimal numbers are rarely used in MediaWiki, we have left it up to translators to add fourth form where they think it's necessary.
The way MediaWiki handles plurals with less than expected number of forms given makes it unnecessary to append any default values - in fact the system will complain about those and mark the messages as outdated.
-Niklas
Note also that CLDR does not use any nu,erotation for plural forms! This is an purely internal mapping of symbols (one, two, zero, few, many, other...) to integers used by Gettext and the existing online translation tools (which in my opinion should better use the symbols) The mapping for symbols to integers should be left as a separate adaptation layer for each languages with ts own local resources (even if this is done by code). And in all languages there should be a mapping of the "other" symbolic name to make sure it is defined everywhere (an in my opinion the "other" rule should be mapped everywhere to form number 0, so that ifa resource is still not translated in one specific form, it can fallback to the "other" form (and always before fallbacking to other languages when there's no form translated at all). The "other" form should then be mandatory before translating any other forms (numbered 1;2;3... according to the language-specific mapping of symbols to this numerotation). As translated resources are not using the numerotation directly but only the symbols (possibly abbreviated to one character; "o"="one", "d"="dual=two", "f"="few", "m"="many",... except "other" mapped as the default with no character at all and numbered 0). No need then to renumber or swap resources (and for many resources, we would no longer need to translate every possible form, and the language-specific mapping of plural forms could use specific fallbacks to another form than the"other" form; used as a last resort fallback). With these fallbacks; the job for transltors would be simplified because most often only one frm will be actually used (the "other" form; generally the most common plural form used also for undetermined numbers); and if a second form is used it is also used for the remaining other forms via their fallbacks (this includes most Slavic languages workin this way) Note that fallbacks frm one plural form to another is specific of the grammatical case; and in fact in Slavic languages the case and number are directly correlated such that a genitive singular can be used as a nominative plural. I'd then suggest that the trnaslation interface allows selecting BOTH the number and grammatial case to map them to one of the possible forms, and then let the adapter interface mame the necessary mapping to the precise symbolic forms or one of its fallbacks.
2014-10-29 9:37 GMT+01:00 Philippe Verdy verdy_p@wanadoo.fr:
Regarding Russian; what I see is that there's a separation of decimals which should be minor (like Uyghur or Asturian), but I don't understand why there was the need to swap the 2nd and 3rd form and why an automated bot could not perform this swap automatically in ressources, and could not be using the existing 3rd form (now 2nd) be used to create a new default value (to review) for the 4th form for numbers with decimals.
2014-10-27 19:44 GMT+01:00 Niklas Laxström niklas.laxstrom@gmail.com:
MediaWiki is upgrading its plural rules to match CLDR version 26. The updates include incompatible changes for plural forms in Russian, Prussian, Tagalog, Manx and several languages that fall back to Russian. In addition there are minor changes for other languages.
In January 2014, CLDR 24 had introduced several changes in the plural forms for some of these languages, including Russian, and we had updated MediaWiki's plural rules to comply with the CLDR standard. Some of these changes are now being reverted. Below is a detailed explanation of the changes.
For the migration period, from Monday, 27th October 2014 to Thursday 6th November 2014, we have disabled LocalisationUpdate at Wikimedia wikis to reduce the chance of ungrammatical translations being displayed in the interface. Translators are requested to start updating translations from Tuesday 28th October 2014 onward.
Further updates will be posted on
https://translatewiki.net/wiki/Thread:Support/Plural_rule_changes_for_many_l...
== Russian and language using Russian as fallback == Languages affected: Russian (ru), Abkhaz (ab), Avaric (av), Bashkir (ba), Buryat (bxr), Chechen (ce), Crimean Tatar (crh-cyrl), Chuvash (cv), Inguish (inh), Komi-Permyak (koi), Karachay-Balkar (krc), Komi (kv), Lak (lbe), Lezghian (lez), Eastern Mari (mhr), Western Mari (mrj), Yakut (sah), Tatar (tt), Tatar-Cyrillic (tt-cyrl), Tuvinian (tyv), Udmurt (udm), Kalmyk (xal).
CLDR 24 plural forms for Russian were:
- Form 1: @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, …
- Form 2: @integer 0, 5~19, 100, 1000, 10000, 100000, 1000000, …
- Form 3: @integer 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102, 1002, …
@decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …
This has been changed to :
- Form 1: @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, …
- Form 2: @integer 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102, 1002, …
- Form 3: @integer 0, 5~19, 100, 1000, 10000, 100000, 1000000, …
- Form 4: @decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0,
1000000.0, …
Plurals in translations for affected languages have been updated automatically where possible. Translators are requested to check all messages containing plurals, starting from those which have been marked as outdated.
== Prussian == Prussian (prg) now follows the same rules as Latvian (lv):
- Form 1: @integer 0, 10~20, 30, 40, 50, 60, 100, 1000, 10000, 100000,
1000000, … @decimal 0.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …
- Form 2: @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, …
@decimal 0.1, 1.0, 1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 10.1, 100.1, 1000.1, …
- Form 3: @integer 2~9, 22~29, 102, 1002, … @decimal 0.2~0.9, 1.2~1.9,
10.2, 100.2, 1000.2, …
Translators are requested to update all translations containing plural rules. Those translations have been marked as outdated.
== Tagalog == Tagalog (tl) has new rules as follows:
- Form 1: @integer 0~3, 5, 7, 8, 10~13, 15, 17, 18, 20, 21, 100, 1000,
10000, 100000, 1000000, … @decimal 0.0~0.3, 0.5, 0.7, 0.8, 1.0~1.3, 1.5, 1.7, 1.8, 2.0, 2.1, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …
- Form 2: @integer 4, 6, 9, 14, 16, 19, 24, 26, 104, 1004, … @decimal
0.4, 0.6, 0.9, 1.4, 1.6, 1.9, 2.4, 2.6, 10.4, 100.4, 1000.4, …
Translators are requested to update all translations containing plural rules. Those translations have been marked as outdated.
== Manx == Manx (gv) has a new (fourth) form for decimals. New rules are as follows:
- Form 1: @integer 1, 11, 21, 31, 41, 51, 61, 71, 101, 1001, …
- Form 2: @integer 2, 12, 22, 32, 42, 52, 62, 72, 102, 1002, …
- Form 3: @integer 0, 20, 40, 60, 80, 100, 120, 140, 1000, 10000,
100000, 1000000, …
- Form 4: @decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0,
1000000.0, …
- Form 5: @integer 3~10, 13~19, 23, 103, 1003, …
Translators are requested to update all translations containing plural rules. Those translations have been marked as outdated.
== Other languages ==
- In Mirandese (mwl), Portuguese (pt) and Brazilian Portuguese
(pt-br), the first form now also includes zero.
- In Uyghur (ug), Lower Sorbian (dsb) and Upper Sorbian (hsb), support
for decimals was added.
- In Asturian (ast) and Western Frisian (fy), the first form is no
longer used for decimals.
Translators are encouraged to review translations with plural forms and update them where necessary. Because the changes have been minor, we have not marked those translations as outdated.
-Niklas
Translators-l mailing list Translators-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/translators-l
2014-10-29 10:58 GMT+02:00 Philippe Verdy verdy_p@wanadoo.fr:
I'd then suggest that the trnaslation interface allows selecting BOTH the number and grammatial case to map them to one of the possible forms, and then let the adapter interface mame the necessary mapping to the precise symbolic forms or one of its fallbacks.
I18n and l10n is complex topic. We are trying to keep it as simple as possible for translators and developers. What you propose does not bring more power to the system, only makes it more complex.
-Niklas
Only complex for developers. The actual need is that translators start translating one form then later realize that they need another form and be able to specify for which case/number they want to specialize a variant. The interface can then propose a "detailled" view summarizing how the various forms are mapped, including their fallback. Translators can then just click and confirm the proposed fallbacks or copy-paste another existing translated form for the same base resource. All forms should be grouped and a si,ple view could just list those that are different; and summarize the list of cases/numbers to which they apply. Internally the fallback mappings is part of the support code, and translators won't have to focus on it. And in the simplest case, they'll start by translating the default plural rule ("other" in CLDR) for used for undetermined numbers and the nominative case (or the effective case when translating full sentences). The internal Gettext numbering of forms (plural, case...) is technical and should not be exposed to translators but they should have a clear labelling of these cases and possible test cases for various numbers in the detailed view. Adding variant forms should remain optional and should not impact the language fallbacks (language fallbacks should never be used as soon as there's the default "other" form translated).
2014-10-29 10:21 GMT+01:00 Niklas Laxström niklas.laxstrom@gmail.com:
2014-10-29 10:58 GMT+02:00 Philippe Verdy verdy_p@wanadoo.fr:
I'd then suggest that the trnaslation interface allows selecting BOTH
the number and grammatial case to map them to one of the possible forms, and then let the adapter interface mame the necessary mapping to the precise symbolic forms or one of its fallbacks.
I18n and l10n is complex topic. We are trying to keep it as simple as possible for translators and developers. What you propose does not bring more power to the system, only makes it more complex.
-Niklas
See for example the translators interface on Facebook that allows adding optional variants based on explicit conditions; but where all of them remain optional with a common fallback to the default form.
2014-10-29 10:21 GMT+01:00 Niklas Laxström niklas.laxstrom@gmail.com:
2014-10-29 10:58 GMT+02:00 Philippe Verdy verdy_p@wanadoo.fr:
I'd then suggest that the trnaslation interface allows selecting BOTH
the number and grammatial case to map them to one of the possible forms, and then let the adapter interface mame the necessary mapping to the precise symbolic forms or one of its fallbacks.
I18n and l10n is complex topic. We are trying to keep it as simple as possible for translators and developers. What you propose does not bring more power to the system, only makes it more complex.
-Niklas
translators-l@lists.wikimedia.org