Hi,
Another edition of i18n software news!
Yesterday, a change was deployed in the Bashkir Wikipedia: The categories
are now sorted in the correct alphabetical order.
Bashkir, like many languages of the Soviet Union, uses the Cyrillic
alphabet with several extra characters. Without proper software support,
the extra letters are sorted according according to their Unicode character
number order, which is not very useful. For example, the letter Ө is
supposed to be in the middle of the alphabet between О and П, but without
correct collation it's in the end, so Ufa (Өфө), the capital of
Bashkortostan, appears in the very end of the alphabet in the "Capitals of
Russian regions" category [1] , but now it appears correctly before П.
This could be resolved by adding the collation for this language to CLDR
and ICU, and I filed a ticket about this with CLDR [2]. Actually getting it
added and deployed is a long process, but the MediaWiki developer Brian
Wolff provided a good interim solution in MediaWiki code itself. The
infrastructure code around it is surprisingly tricky, but to simply add a
new alphabet, you just need to create a file like this:
https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/c…
When it is added to CLDR and ICU, this stopgap solution can be removed from
MediaWiki.
As far as I can see, Bashkir is the first language for which such a
comprehensive solution was made inside MediaWiki, and it is needed for many
others. I'll start looking for other languages where this is needed. My
process would be something like this:
1. Find a languages in which there is a Wikipedia with incorrect collation.
2. Find the correct alphabetical order, using a grammar book or a
dictionary, and confirm it with editors in that language.
3. Submit a ticket to CLDR.
4. Add a file with an alphabet, like the Bashkir file above, to MediaWiki
core.
5. Get it reviewed, merged, and deployed.
6. Deploy the change to the projects in that language.
7. Run a script that converts the categories to the new collation.
(Steps 5 and 6 sound repetitive because it needs to explicitly enabled for
each wiki. I filed another bug [4], which suggests defining a default
collation per language, so that step 6 won't be needed.)
If anybody has better suggestions about working with CLDR and ICU and
getting them to add and release these collation files faster, I'll be very
happy to hear them.
[1] http://bit.ly/2sWLJaX
[2] http://unicode.org/cldr/trac/ticket/10195
[3] For the confirmation about Bashkir see
https://phabricator.wikimedia.org/T162823 .
[4] https://phabricator.wikimedia.org/T164985
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
Hi,
There is a proposal to add support for the Ho language to transaltewiki:
https://translatewiki.net/wiki/Thread:Support/Request_to_start_a_new_langua…
.
It is for now not implemented, and the explanation is that the request is
to do it in the Warang Citi writing system, and Ethnologue says that it is
"no longer in use". I strongly suspect that Ethnologue is not quite correct
on this matter, because there are three sources that contradict it:
* the encoding proposal by Michael Everson
* the page at Scriptsource to which Ethnologue itself links
* an article by Norman Zide, linked from Scriptsource
I have no direct knowledge of this language, but the sources above seem
more convincing to me than Ethnologue itself.
The remaining question, however, is whether we should add more than one
variant for this language (hoc-wara, hoc-deva, and perhaps hoc-latn) or
should it be just hoc, and assumed to be written in Warang Citi?
Thanks!
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
Hi,
Apparently there is some activity in the Coptic Incubator Wikipedia:
portal: https://incubator.wikimedia.org/wiki/Wp/cop
activity: https://tools.wmflabs.org/meta/catanalysis/index.php?
cat=0&title=Wp/cop&wiki=incubatorwiki#distribution_2017-02
request:
https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Coptic…
And there's a request to translate MediaWiki into this language:
https://incubator.wikimedia.org/w/index.php?title=
Incubator:Community_Portal&oldid=4099303#Translatewiki
However, translatewiki and UniversalLanguageSelector are not yet enabled in
this language. As far as I know, the language is not exactly alive as a
modern language. It's definitely eligible for Wikisource, so it can be in
the UniversalLanguageSelector (although I need to make sure what is the
autonym - "ϯⲙⲉⲧⲣⲉⲙⲛ̀ⲭⲏⲙⲓ"?).
But what about a Wikipedia, and what about translating the MediaWiki user
interface strings into it? These would probably be revivalist projects
because there are no L1 speakers.
If it's not eligible, I'd rather not enable it on translatewiki.
Personally, I would support marking it as eligible, but are there other
opinions?
It was already rejected in 2008:
https://meta.wikimedia.org/wiki/Requests_for_new_
languages/Wikipedia_Coptic_2
... But that was long ago, and maybe it's worth reconsidering?
Thank you!
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
This is to inform the public following this list that we discussed about
inactive members on the private list (as it's a personal issue).
After having mailed Berto d'Sera, Arria Belli and ZaDiak who all haven't
posted to the mailing list in years, we didn't get any responses from them
and so decided to remove them as members. At the same time, we adopted a
general inactivity policy: Members who don't write anything on the list(s)
for 2 years are removed. They will first be contacted to see if they wish
to continue participating.
Hi,
I am trying to clean up some autonyms that we have in MediaWiki.
The Kongo language (code kg) is one that baffles me completely. It seems to
have a lot of different autonyms in every place that I'm trying to look. Is
it "Kongo"?
"Kôngo"?
"Koongo"?
"KiKongo"?
"KiKôngo"?
"Kimanianga"?
"Kituba Kôngo"?
Just "Kituba"?
Something else?
I tried looking for any dictionary or grammar online and couldn't find
anything.
I guess that there are several standards, but I need to decide on a single
thing that will appear in the sidebar as an interlanguage link, so it must
be something that balances between usefulness for people who actually read
and write in this language and some kind of standard. Currently it's
"Kongo", which is simple, but I'm not entirely sure it's correct.
Your help will be greatly appreciated. Thanks!
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore