Langcom August 2017

langcom@lists.wikimedia.org

7 participants
6 discussions

[i18n software news] Collation for Bashkir
by Amir E. Aharoni 24 Sep '17

24 Sep '17

Hi, Another edition of i18n software news! Yesterday, a change was deployed in the Bashkir Wikipedia: The categories are now sorted in the correct alphabetical order. Bashkir, like many languages of the Soviet Union, uses the Cyrillic alphabet with several extra characters. Without proper software support, the extra letters are sorted according according to their Unicode character number order, which is not very useful. For example, the letter Ө is supposed to be in the middle of the alphabet between О and П, but without correct collation it's in the end, so Ufa (Өфө), the capital of Bashkortostan, appears in the very end of the alphabet in the "Capitals of Russian regions" category [1] , but now it appears correctly before П. This could be resolved by adding the collation for this language to CLDR and ICU, and I filed a ticket about this with CLDR [2]. Actually getting it added and deployed is a long process, but the MediaWiki developer Brian Wolff provided a good interim solution in MediaWiki code itself. The infrastructure code around it is surprisingly tricky, but to simply add a new alphabet, you just need to create a file like this: https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/c… When it is added to CLDR and ICU, this stopgap solution can be removed from MediaWiki. As far as I can see, Bashkir is the first language for which such a comprehensive solution was made inside MediaWiki, and it is needed for many others. I'll start looking for other languages where this is needed. My process would be something like this: 1. Find a languages in which there is a Wikipedia with incorrect collation. 2. Find the correct alphabetical order, using a grammar book or a dictionary, and confirm it with editors in that language. 3. Submit a ticket to CLDR. 4. Add a file with an alphabet, like the Bashkir file above, to MediaWiki core. 5. Get it reviewed, merged, and deployed. 6. Deploy the change to the projects in that language. 7. Run a script that converts the categories to the new collation. (Steps 5 and 6 sound repetitive because it needs to explicitly enabled for each wiki. I filed another bug [4], which suggests defining a default collation per language, so that step 6 won't be needed.) If anybody has better suggestions about working with CLDR and ICU and getting them to add and release these collation files faster, I'll be very happy to hear them. [1] http://bit.ly/2sWLJaX [2] http://unicode.org/cldr/trac/ticket/10195 [3] For the confirmation about Bashkir see https://phabricator.wikimedia.org/T162823 . [4] https://phabricator.wikimedia.org/T164985 -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬

3 4

Board liaison
by James Heilman 31 Aug '17

31 Aug '17

Hey All I have taken on the role of board liaison to the language committee. As such wondering if you can add me to this mailing list. Do realize that this group functions fairly autonomously and no intention to change that. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine

5 5

Saraiki Wikipedia marked as eligible
by Satdeep Gill 30 Aug '17

30 Aug '17

Hi everyone, I just marked Saraiki Wikipedia as eligibile.[1] There is some controversy regarding this but according to my analysis, it should be eligible. 1. https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Saraiki Regards Satdeep Gill Strategy Coordinator, Wikimedia Foundation <https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Track_B#SG…> Co-founder, Punjabi Wikimedians <https://meta.wikimedia.org/wiki/Punjabi_Wikimedians> Treasurer, Affiliations Committee <https://meta.wikimedia.org/wiki/Affiliations_Committee> Member, Language Committee <https://meta.wikimedia.org/wiki/Language_committee> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

2 1

Santali Wikipedia marked as eligible
by Satdeep Gill 29 Aug '17

29 Aug '17

Hi, Santali is one of the scheduled languages of India [1] and I have marked Santali Wikipedia as eligible. [2] 1. https://en.wikipedia.org/wiki/Eighth_Schedule_to_the_Constitution_of_India 2. https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Santali Regards Satdeep Gill Strategy Coordinator, Wikimedia Foundation <https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Track_B#SG…> Co-founder, Punjabi Wikimedians <https://meta.wikimedia.org/wiki/Punjabi_Wikimedians> Treasurer, Affiliations Committee <https://meta.wikimedia.org/wiki/Affiliations_Committee> Member, Language Committee <https://meta.wikimedia.org/wiki/Language_committee> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

1 0

Guianan Creole
by MF-Warburg 28 Aug '17

28 Aug '17

Is this eligible? < https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Guiana…> I suppose so.

2 2

Ho and Warang Citi
by Amir E. Aharoni 02 Aug '17

02 Aug '17

Hi, There is a proposal to add support for the Ho language to transaltewiki: https://translatewiki.net/wiki/Thread:Support/Request_to_start_a_new_langua… . It is for now not implemented, and the explanation is that the request is to do it in the Warang Citi writing system, and Ethnologue says that it is "no longer in use". I strongly suspect that Ethnologue is not quite correct on this matter, because there are three sources that contradict it: * the encoding proposal by Michael Everson * the page at Scriptsource to which Ethnologue itself links * an article by Norman Zide, linked from Scriptsource I have no direct knowledge of this language, but the sources above seem more convincing to me than Ethnologue itself. The remaining question, however, is whether we should add more than one variant for this language (hoc-wara, hoc-deva, and perhaps hoc-latn) or should it be just hoc, and assumed to be written in Warang Citi? Thanks! -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬

3 9

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Langcom August 2017