Ambassadors,
Sorry for being silent for so long. I have a (maybe) important update for CirrusSearch. I'm currently in the process of pushing unicode normalization [0] to many languages [1]. In some languages this will (hopefully!) be great and in others it won't change anything. If this has broken anything please let me know. Reply or file a bug or whatever is easiest for you.
Thanks for reading!
Nik
[0]: NFKC with case folding http://unicode.org/reports/tr15/#Norm_Forms for those who want to read more/already know and love unicode normalization. [1]: All languages _but_ these: arabic, armenian, basque, brazilian, bulgarian, catalan, chinese, czech, danish, dutch, finnish, french, galician, german, greek, hindi, hungarian, indonesian, italian, norwegian, persian, portuguese, romanian, russian, spanish, swedish, turkish, thai. Its a long story why they aren't getting it, but they will in time if everything goes well....
Hi Nik,
Thanks for the update, and I look forward to the output.
Some questions. 1) Is this going out independently of the WMF7 update? Or is part of that update and silent in the detail? 2) This will be in all English language wikis, including the central wikimedia.org wikis? 3) What is happening with the multilanguage wikis? eg. commons, wikidata (or are they considered English?)
Thanks. Regards, Billinghurst
On Mon, 2 Jun 2014 12:21:53 -0400, Nikolas Everett neverett@wikimedia.org wrote:
Ambassadors,
Sorry for being silent for so long. I have a (maybe) important update
for
CirrusSearch. I'm currently in the process of pushing unicode normalization [0] to many languages [1]. In some languages this will (hopefully!) be great and in others it won't change anything. If this
has
broken anything please let me know. Reply or file a bug or whatever is easiest for you.
Thanks for reading!
Nik
[0]: NFKC with case folding
http://unicode.org/reports/tr15/#Norm_Forms
for those who want to read more/already know and love unicode normalization. [1]: All languages _but_ these: arabic, armenian, basque, brazilian, bulgarian, catalan, chinese, czech, danish, dutch, finnish, french, galician, german, greek, hindi, hungarian, indonesian, italian,
norwegian,
persian, portuguese, romanian, russian, spanish, swedish, turkish, thai. Its a long story why they aren't getting it, but they will in time if everything goes well....
On Mon, Jun 2, 2014 at 1:01 PM, billinghurst billinghurst@gmail.com wrote:
Hi Nik,
Thanks for the update, and I look forward to the output.
Some questions.
- Is this going out independently of the WMF7 update? Or is part of that
update and silent in the detail?
Independently of the update. Technically the code to support it was pushed to the cluster weeks ago and we just had to turn it on. I didn't announce it when the support was pushed because that wouldn't have been interesting. I waited so long to flip the switch because I was away at a conference and would have been unable to jump on issues if any came up. I messed up the communication regarding actually turning it on because I've been excited about it for days. Imagine me thinking about unicode normalization while being bounced around in my little Dash 8. Anyway, my last email was my attempt to fix my communications mistake.
2) This will be in all English language wikis, including the central
wikimedia.org wikis?
Yes, it'll go to all the wikis the think of themselves as English. It'll take a few days I think for them all to soak up the change. The first wikis to get it were actually Hebrew because the analyzer that I was using for them was flaking out. They only got it a few hours ago.
I did just realize I made the mistake of sending it to the en* wikis before the test* and mediawiki.org wikis. I'm not doing well today.... Anyway, it is pushed to the test wikis and mediawiki.org now. As of sending this email it is in all Hebrew wikis, and English wikivoyage, wikiversity, wikisource, wikiquote, wikinews, and wikibooks. English Wiktionary is getting it right now.
- What is happening with the multilanguage wikis? eg. commons, wikidata
(or are they considered English?)
They are still going to pretend they are in English. They'll get this. Other then that, nothing yet. We have plans to do better but thats lower down the list unfortunately.
Nik
wikitech-ambassadors@lists.wikimedia.org