Just yesterday I managed to get https://gerrit.wikimedia.org/r/#/c/49776/ merged. Based heavily on Tim's work on the IcuCollation, it allows one to *finally* get articles to be correctly sorted on category pages for 67 languages based in latin, greek and cyrillic alphabets.
I also created https://bugzilla.wikimedia.org/show_bug.cgi?id=45443 to track the process of getting this deployed to Wikimedia wikis. The process is already underway for uk.wiki and pl.wiki; if anybody technical wishes to get it on their wiki first, please create a sub-bug and start a community discussion/vote - I can provide a testwiki in your language :)
Eventually, I'd like this to be deployed on all wikis in those 67 languages. I'll start poking people about this (and will drop a mail to -ambassadors) once wmf11 is deployed and the change goes live on a few wikis.
Oh no! You mean bug#164 will be solved now in less than nine years? That's a great day, hurray!!!!!!!!! :-))))))) And thanks a lot. When is it scheduled to be deployed?
2013/2/27 Bartosz Dziewoński matma.rex@gmail.com
Just yesterday I managed to get https://gerrit.wikimedia.org/** r/#/c/49776/ https://gerrit.wikimedia.org/r/#/c/49776/ merged. Based heavily on Tim's work on the IcuCollation, it allows one to *finally* get articles to be correctly sorted on category pages for 67 languages based in latin, greek and cyrillic alphabets.
I also created https://bugzilla.wikimedia.**org/show_bug.cgi?id=45443https://bugzilla.wikimedia.org/show_bug.cgi?id=45443to track the process of getting this deployed to Wikimedia wikis. The process is already underway for uk.wiki and pl.wiki; if anybody technical wishes to get it on their wiki first, please create a sub-bug and start a community discussion/vote - I can provide a testwiki in your language :)
Eventually, I'd like this to be deployed on all wikis in those 67 languages. I'll start poking people about this (and will drop a mail to -ambassadors) once wmf11 is deployed and the change goes live on a few wikis.
-- Matma Rex
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, 28 Feb 2013 00:11:19 +0100, Bináris wikiposta@gmail.com wrote:
When is it scheduled to be deployed?
The code change itself will go live with MW 1.21wmf11 (see https://www.mediawiki.org/wiki/MediaWiki_1.21/Roadmap for deployment dates), and I'll try to get the configuration changes deployed on pl.wiki (and possibly uk.wiki as well) shortly afterwards.
There's no Great Deployment Plan (yet), and I don't have enough free time (nor access to WMF resources) to draft one. As I said, I'll mail the -ambassadors list, set up testwikis and submit config change proposals for anyone who wishes to have one, and that's probably all I can do. I'll try to poke some Wikimedia communities about this, though (especially ones with particularly fanciful ;) alphabets).
Op 27 feb. 2013 om 23:50 heeft Bartosz Dziewoński matma.rex@gmail.com het volgende geschreven:
Just yesterday I managed to get https://gerrit.wikimedia.org/r/#/c/49776/ merged. Based heavily on Tim's work on the IcuCollation, it allows one to *finally* get articles to be correctly sorted on category pages for 67 languages based in latin, greek and cyrillic alphabets.
Nice work, Bartosz. Thank you for all your efforts. Don't stop there, go for gold and beyond the three scripts you can read ;).
Cheers!
-- Siebrand Mazeland
M: +31 6 50 69 1239 Skype: siebrand
Does this need any maintenance/* runs? I want to test this for Belarusian (be + be-tarask), although now I have what I had before the git pull.
On Thu, Feb 28, 2013 at 1:50 AM, Bartosz Dziewoński matma.rex@gmail.com wrote:
Just yesterday I managed to get https://gerrit.wikimedia.org/r/#/c/49776/ merged. Based heavily on Tim's work on the IcuCollation, it allows one to *finally* get articles to be correctly sorted on category pages for 67 languages based in latin, greek and cyrillic alphabets.
I also created https://bugzilla.wikimedia.org/show_bug.cgi?id=45443 to track the process of getting this deployed to Wikimedia wikis. The process is already underway for uk.wiki and pl.wiki; if anybody technical wishes to get it on their wiki first, please create a sub-bug and start a community discussion/vote - I can provide a testwiki in your language :)
Eventually, I'd like this to be deployed on all wikis in those 67 languages. I'll start poking people about this (and will drop a mail to -ambassadors) once wmf11 is deployed and the change goes live on a few wikis.
-- Matma Rex
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- З павагай, Павел Селіцкас/Pavel Selitskas Wizardist @ Wikimedia projects
On Thu, 28 Feb 2013 00:20:09 +0100, Paul Selitskas p.selitskas@gmail.com wrote:
Does this need any maintenance/* runs? I want to test this for Belarusian (be + be-tarask), although now I have what I had before the git pull.
Yes, you need to run maintenance/updateCollation.php and then purge all category pages.
And if you run into any weird display bugs (like letters sorting under headings containing weird symbols), check out https://bugzilla.wikimedia.org/show_bug.cgi?id=43740 .
I had to add 'be-tarask' to $tailoringFirstLetters and set $wgCategoryCollation explicitly to make this thing work. But it damn works! Awesome, thanks!
Can character mapping be also implemented here? For example, in Belarusian letter «Ґ» should be in the same section as «Г», and «Ў» in the same section as «У». It's not an urgent request, just my curiosity.
On Thu, Feb 28, 2013 at 2:27 AM, Bartosz Dziewoński matma.rex@gmail.com wrote:
On Thu, 28 Feb 2013 00:20:09 +0100, Paul Selitskas p.selitskas@gmail.com wrote:
Does this need any maintenance/* runs? I want to test this for Belarusian (be + be-tarask), although now I have what I had before the git pull.
Yes, you need to run maintenance/updateCollation.php and then purge all category pages.
And if you run into any weird display bugs (like letters sorting under headings containing weird symbols), check out https://bugzilla.wikimedia.org/show_bug.cgi?id=43740 .
-- Matma Rex
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- З павагай, Павел Селіцкас/Pavel Selitskas Wizardist @ Wikimedia projects
Hmm. The collation chart for "be" [1] doesnt seem to mention Ґ. It does mention ѓ though which looks kind of similar to my untrained eye. In any case, If the lack of Ґ being specified is incorrect it is an upstream issue with either the icu project or the cldr project (I think).
[1] http://collation-charts.org/icu442/icu442-be.html
-bawolff
On 2013-02-27 7:34 PM, "Paul Selitskas" p.selitskas@gmail.com wrote:
I had to add 'be-tarask' to $tailoringFirstLetters and set $wgCategoryCollation explicitly to make this thing work. But it damn works! Awesome, thanks!
Can character mapping be also implemented here? For example, in Belarusian letter «Ґ» should be in the same section as «Г», and «Ў» in the same section as «У». It's not an urgent request, just my curiosity.
On Thu, Feb 28, 2013 at 2:27 AM, Bartosz Dziewoński matma.rex@gmail.com
wrote:
On Thu, 28 Feb 2013 00:20:09 +0100, Paul Selitskas <
p.selitskas@gmail.com>
wrote:
Does this need any maintenance/* runs? I want to test this for Belarusian (be + be-tarask), although now I have what I had before the git pull.
Yes, you need to run maintenance/updateCollation.php and then purge all category pages.
And if you run into any weird display bugs (like letters sorting under headings containing weird symbols), check out https://bugzilla.wikimedia.org/show_bug.cgi?id=43740 .
-- Matma Rex
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- З павагай, Павел Селіцкас/Pavel Selitskas Wizardist @ Wikimedia projects
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, 28 Feb 2013 00:33:57 +0100, Paul Selitskas p.selitskas@gmail.com wrote:
I had to add 'be-tarask' to $tailoringFirstLetters and set $wgCategoryCollation explicitly to make this thing work.
Yes, it's not enabled by default. That should probably wait until the support is more battle-tested :)
Can character mapping be also implemented here? For example, in Belarusian letter «Ґ» should be in the same section as «Г», and «Ў» in the same section as «У». It's not an urgent request, just my curiosity.
I created a testwiki in Belarussian with uca-be collation to test this: http://users.v-lo.krakow.pl/~matmarex/testwiki-be/index.php?title=%D0%9A%D0%...
It seems like Ґ and Г behave correctly. I don't know why Ў and У are separate; probably most languages they're used in consider them entirely separate letters. This is certainly doable, though; we simply need to make Ў not create a heading in the same way we made ё create one; it should start sorting under У then. I didn't realize this kind of behavior is possible :)
(If they are sorted / separated differently on your install, you probably need to run the maintenance/languages/generateCollationData.php script - see https://bugzilla.wikimedia.org/show_bug.cgi?id=43740 .)
The result in the link you provided is ideal. Г and Ґ are in one bucket, while У and Ў are separated. That's what we need.
Great job done!
On Thu, Feb 28, 2013 at 6:24 PM, Bartosz Dziewoński matma.rex@gmail.com wrote:
On Thu, 28 Feb 2013 00:33:57 +0100, Paul Selitskas p.selitskas@gmail.com wrote:
I had to add 'be-tarask' to $tailoringFirstLetters and set $wgCategoryCollation explicitly to make this thing work.
Yes, it's not enabled by default. That should probably wait until the support is more battle-tested :)
Can character mapping be also implemented here? For example, in Belarusian letter «Ґ» should be in the same section as «Г», and «Ў» in the same section as «У». It's not an urgent request, just my curiosity.
I created a testwiki in Belarussian with uca-be collation to test this: http://users.v-lo.krakow.pl/~matmarex/testwiki-be/index.php?title=%D0%9A%D0%...
It seems like Ґ and Г behave correctly. I don't know why Ў and У are separate; probably most languages they're used in consider them entirely separate letters. This is certainly doable, though; we simply need to make Ў not create a heading in the same way we made ё create one; it should start sorting under У then. I didn't realize this kind of behavior is possible :)
(If they are sorted / separated differently on your install, you probably need to run the maintenance/languages/generateCollationData.php script - see https://bugzilla.wikimedia.org/show_bug.cgi?id=43740 .)
-- Matma Rex
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- З павагай, Павел Селіцкас/Pavel Selitskas Wizardist @ Wikimedia projects
On Thu, 28 Feb 2013 16:28:43 +0100, Paul Selitskas p.selitskas@gmail.com wrote:
The result in the link you provided is ideal. Г and Ґ are in one bucket, while У and Ў are separated. That's what we need.
Ah, that's great, I misunderstood. :)
Hoi, I googled for IcuCollation and found this on their website ... Starting in release 1.8, the ICU Collation Service is updated to be fully compliant to the Unicode Collation Algorithm (UCA) ( http://www.unicode.org/unicode/reports/tr10/ ) and conforms to ISO 14651.
My question, will we use a version of IcuCollation that is later than 1.8. Asking if IcuCollation supports the latest version of the UCA is probably too much to ask for... It would give us alphabetic characters after the characters of a default script.
Thanks, GerardM
On 27 February 2013 23:50, Bartosz Dziewoński matma.rex@gmail.com wrote:
Just yesterday I managed to get https://gerrit.wikimedia.org/** r/#/c/49776/ https://gerrit.wikimedia.org/r/#/c/49776/ merged. Based heavily on Tim's work on the IcuCollation, it allows one to *finally* get articles to be correctly sorted on category pages for 67 languages based in latin, greek and cyrillic alphabets.
I also created https://bugzilla.wikimedia.**org/show_bug.cgi?id=45443https://bugzilla.wikimedia.org/show_bug.cgi?id=45443to track the process of getting this deployed to Wikimedia wikis. The process is already underway for uk.wiki and pl.wiki; if anybody technical wishes to get it on their wiki first, please create a sub-bug and start a community discussion/vote - I can provide a testwiki in your language :)
Eventually, I'd like this to be deployed on all wikis in those 67 languages. I'll start poking people about this (and will drop a mail to -ambassadors) once wmf11 is deployed and the change goes live on a few wikis.
-- Matma Rex
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l
IcuCollation is the name of the mediawiki class. The actual underlying code is from a software project called Icu (or more specificly icu4c). Which version used depends on which version mediawiki is compiled against. Version 1.8 is really really old (which makes me think you got the wrong software project maybe?). The latest stable release is 50, but 51 is going to be released soon. We use version 4.2 of icu library which implements CLDR 1.7 and Unicode 5.1, which is a tad older but not horribly. I believe people want to update icu to a newer version for better chinese collation. I imagine eventually things would be updated to a version that has the script reordering. Otoh icu library updates have a high cost, so maybe not unless there is a better benefit than reordering the script. (note script reordering is already an option. The only difference in newest uca is that reordering the script is the default instead of an option).
Tl; dr: no. Its compatible with a version of uca. But uca is not a fixed standard and changes.
-bawolff On 2013-03-02 8:31 AM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
Hoi, I googled for IcuCollation and found this on their website ... Starting in release 1.8, the ICU Collation Service is updated to be fully compliant to the Unicode Collation Algorithm (UCA) ( http://www.unicode.org/unicode/reports/tr10/ ) and conforms to ISO 14651.
My question, will we use a version of IcuCollation that is later than 1.8. Asking if IcuCollation supports the latest version of the UCA is probably too much to ask for... It would give us alphabetic characters after the characters of a default script.
Thanks, GerardM
On 27 February 2013 23:50, Bartosz Dziewoński matma.rex@gmail.com wrote:
Just yesterday I managed to get https://gerrit.wikimedia.org/** r/#/c/49776/ https://gerrit.wikimedia.org/r/#/c/49776/ merged. Based heavily on Tim's work on the IcuCollation, it allows one to *finally* get articles to be correctly sorted on category pages for 67 languages based
in
latin, greek and cyrillic alphabets.
I also created https://bugzilla.wikimedia.**org/show_bug.cgi?id=45443<
https://bugzilla.wikimedia.org/show_bug.cgi?id=45443%3Eto track the process of getting this deployed to Wikimedia wikis. The
process is already underway for uk.wiki and pl.wiki; if anybody technical wishes to get it on their wiki first, please create a sub-bug and start a community discussion/vote - I can provide a testwiki in your language :)
Eventually, I'd like this to be deployed on all wikis in those 67 languages. I'll start poking people about this (and will drop a mail to -ambassadors) once wmf11 is deployed and the change goes live on a few wikis.
-- Matma Rex
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<
https://lists.wikimedia.org/mailman/listinfo/wikitech-l%3E _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org