Hi,
Another edition of i18n software news!
Yesterday, a change was deployed in the Bashkir Wikipedia: The categories
are now sorted in the correct alphabetical order.
Bashkir, like many languages of the Soviet Union, uses the Cyrillic
alphabet with several extra characters. Without proper software support,
the extra letters are sorted according according to their Unicode character
number order, which is not very useful. For example, the letter Ө is
supposed to be in the middle of the alphabet between О and П, but without
correct collation it's in the end, so Ufa (Өфө), the capital of
Bashkortostan, appears in the very end of the alphabet in the "Capitals of
Russian regions" category [1] , but now it appears correctly before П.
This could be resolved by adding the collation for this language to CLDR
and ICU, and I filed a ticket about this with CLDR [2]. Actually getting it
added and deployed is a long process, but the MediaWiki developer Brian
Wolff provided a good interim solution in MediaWiki code itself. The
infrastructure code around it is surprisingly tricky, but to simply add a
new alphabet, you just need to create a file like this:
https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/c…
When it is added to CLDR and ICU, this stopgap solution can be removed from
MediaWiki.
As far as I can see, Bashkir is the first language for which such a
comprehensive solution was made inside MediaWiki, and it is needed for many
others. I'll start looking for other languages where this is needed. My
process would be something like this:
1. Find a languages in which there is a Wikipedia with incorrect collation.
2. Find the correct alphabetical order, using a grammar book or a
dictionary, and confirm it with editors in that language.
3. Submit a ticket to CLDR.
4. Add a file with an alphabet, like the Bashkir file above, to MediaWiki
core.
5. Get it reviewed, merged, and deployed.
6. Deploy the change to the projects in that language.
7. Run a script that converts the categories to the new collation.
(Steps 5 and 6 sound repetitive because it needs to explicitly enabled for
each wiki. I filed another bug [4], which suggests defining a default
collation per language, so that step 6 won't be needed.)
If anybody has better suggestions about working with CLDR and ICU and
getting them to add and release these collation files faster, I'll be very
happy to hear them.
[1] http://bit.ly/2sWLJaX
[2] http://unicode.org/cldr/trac/ticket/10195
[3] For the confirmation about Bashkir see
https://phabricator.wikimedia.org/T162823 .
[4] https://phabricator.wikimedia.org/T164985
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
Hi,
There is a proposal to add support for the Ho language to transaltewiki:
https://translatewiki.net/wiki/Thread:Support/Request_to_start_a_new_langua…
.
It is for now not implemented, and the explanation is that the request is
to do it in the Warang Citi writing system, and Ethnologue says that it is
"no longer in use". I strongly suspect that Ethnologue is not quite correct
on this matter, because there are three sources that contradict it:
* the encoding proposal by Michael Everson
* the page at Scriptsource to which Ethnologue itself links
* an article by Norman Zide, linked from Scriptsource
I have no direct knowledge of this language, but the sources above seem
more convincing to me than Ethnologue itself.
The remaining question, however, is whether we should add more than one
variant for this language (hoc-wara, hoc-deva, and perhaps hoc-latn) or
should it be just hoc, and assumed to be written in Warang Citi?
Thanks!
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
The community is getting impatient that LangCom is leaving so many items in limbo, especially the Beta Wikiversity proposal. (See https://meta.wikimedia.org/wiki/Talk:Closing_projects_policy#This_is_not_wo….) That proposal passed its fourth anniversary in the last week! In April, you discussed this question, and seemed on the verge of approving the closure—but then the discussion was stopped, and no action was taken.
There are currently four proposals that have been open for two years or longer, and another two that have been open for over a year. I strongly urge you to take action at least on those, lest the community step in and make its own decisions. As a reminder these are:
* Move Beta Wikiversity to Incubator
* Deletion of Moldovan WIkipedia and Wiktionary (two separate proposals) (already closed and locked)
* Deletion of Marshallese projects (already closed and locked)
* Closure of Limburgish and Bosnian Wikibooks (two proposals)
I'd venture to say that there is no groundswell insisting on the Marshallese, Limburgish or Bosnian proposals, and you could easily close those proposals as "not done" with little fanfare. The others I cannot really comment on.
Steven White (StevenJ81)
Sent from Outlook<http://aka.ms/weboutlook>
Hi all,
I think Ingush Wikipedia can be approved, from the activity viewpoint. The
translation of the most-used messages is complete (<
http://tools.wmflabs.org/robin/?tool=codelookup&code=inh>) and there has
been a quite high activity since almost ten months now <
https://tools.wmflabs.org/meta/catanalysis/index.php?cat=0&title=Wp/inh&wik…
>.
Now we would of course need verificiation of the content. Searching the
archives, I found a mail from Amir from 10 November 2011. Back then, a
linguist had said the language in the test-wiki was not quite what would be
expected from literary Ingush. However, the current editors are all
different from the ones that were active five years ago.
Amir, could you check with that linguist or someone else from the Ingush
State University again about the quality of the content?
Best regards, MF-W
One issue: voting.
== Voting ==
This is also proposal, so read it and comment if you don't agree or
you want any addition.
1) No voting
1.1) According to the Closing projects policy [1], particular member
of the committee analyzes discussion and, if decides that the project
should be closed, sends the request to WMF Board.
1.2) Clear-cut situations for making a language eligible for Wikimedia
projects: the language has a valid ISO 639-3 code, there are no
significant issues in relation to the language itself, the population
of speakers is significant, request made by a native speaker. In this
case, any committee member can mark language / project eligible.
1.3) Approval without obvious formal requirements. No project will be
approved without them.
2) Simple majority (of those who expressed opinion)
2.1) Eligibility of a language with a valid ISO 639-3 code, but
without significant population of native speakers. (Note: this covers
ancient, constructed, reviving and languages with small number of
speakers.)
2.2) Eligibility of a language without a valid ISO 639-3 code, but
valid BCP 47 code. (Note: this covers Ecuadorian Quechua.)
2.3) Eligibility of a language with significant collision between
prescriptive and descriptive information. (Note: this covers
"macrolangauges".)
2.4) Project approval if not 1.3.
3) 2/3 majority (of those who expressed opinion)
3.1) Any change of the rules, including the committee's role in
possible changes of the Language proposal policy [2] and Closing
projects policy [1].
4) Consensus (of those who expressed opinion)
4.1) A new member of the Language committee should not be opposed by
any of the current committee member.
[1] https://meta.wikimedia.org/wiki/Closing_projects_policy
[2] https://meta.wikimedia.org/wiki/Meta:Language_proposal_policy
Hi all,
First of all, your work is being appreciated.
That being said, I have been grinding my teeth a little bit over the
discussions going on at this page on your talkpage on meta:
https://meta.wikimedia.org/wiki/Talk:Language_committee#Request_for_launchi…
It is not so much the outcome of the process that is somewhat worrying me,
but the fact that it is apparently hard for people to see what's going on.
While this mailing list is apparently open, it is not transparent in a
practical sense what the committee decides, when it decides and how it
decides. People can read this list, but what they really are looking for is
a short committee statement on the outcome.
As I understand it, StevenJ81 has been doing some of this in a personal
capacity, to the best of his abilities summarizing what's going on here.
I can't really tell you how to conduct your business of course, but I would
like to ask you to consider whether you could summarize decisions in a few
lines of text in a quotable manner, when the committee comes to a
conclusion. That way it is clear to everyone a discussion is closed, and
that arguments were considered. Who then copies that statement to meta (one
of you, or Steven), is another matter.
This is the first time in a long while I had a closer run-in with your
committee's work, but that will be the case for most of the people who make
an application. I hope that for their sake, you will consider my
suggestion.
Kind regards,
Lodewijk (Effeietsanders)