Now MobileFrontend is using JSON for languages, I jumped on this to create a script to make language addition easier - basically a command line interface called `make message` that edits the JSONs to add an English message and QQQ code and maintains alphabetical ordering [1].
Recently this was used and some updates came from translatewiki.net
I ran my `make message` script and noticed it made some changes to those from translation updator bot [2]. I was wondering - what would be the correct way to store these messages? Do I need to update my script or should Translator bot being doing things differently?
"아라" or "\uc544\ub77c" "\u003Ccode\u003E" or "<code>" ?
Thanks in advances for your opinions!
[1] https://gerrit.wikimedia.org/r/#/c/119637/ [2] https://gist.github.com/jdlrobson/9767604
On Mar 25, 2014 3:11 PM, "Jon Robson" jdlrobson@gmail.com wrote:
Now MobileFrontend is using JSON for languages, I jumped on this to create a script to make language addition easier - basically a command line interface called `make message` that edits the JSONs to add an English message and QQQ code and maintains alphabetical ordering [1].
Recently this was used and some updates came from translatewiki.net
I ran my `make message` script and noticed it made some changes to those from translation updator bot [2]. I was wondering - what would be the correct way to store these messages? Do I need to update my script or should Translator bot being doing things differently?
"아라" or "\uc544\ub77c" "\u003Ccode\u003E" or "<code>" ?
Thanks in advances for your opinions!
[1] https://gerrit.wikimedia.org/r/#/c/119637/ [2] https://gist.github.com/jdlrobson/9767604
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Lets not escape unicode characters unnessarily - humans read those files too.
JSON files are allowed to have most characters as unescaped utf8
-bawolff
On Mar 25, 2014 7:17 PM, "Brian Wolff" bawolff@gmail.com wrote:
On Mar 25, 2014 3:11 PM, "Jon Robson" jdlrobson@gmail.com wrote:
Now MobileFrontend is using JSON for languages, I jumped on this to create a script to make language addition easier - basically a command line interface called `make message` that edits the JSONs to add an English message and QQQ code and maintains alphabetical ordering [1].
Recently this was used and some updates came from translatewiki.net
I ran my `make message` script and noticed it made some changes to those from translation updator bot [2]. I was wondering - what would be the correct way to store these messages? Do I need to update my script or should Translator bot being doing things differently?
"아라" or "\uc544\ub77c" "\u003Ccode\u003E" or "<code>" ?
Thanks in advances for your opinions!
[1] https://gerrit.wikimedia.org/r/#/c/119637/ [2] https://gist.github.com/jdlrobson/9767604
I recentish looked at json encoding for a different project. The conclusion there too pretty much was to never use Unicode escapes except when demanded by the spec.
As often, non bmp stuff may be painful. Composite pairs of Unicode escapes may be used to describe a codepoint in json. Whether you prefer to believe that the json encoder of your consumer is less likely to choke on astral plane characters or on encoded composite pairs is up for debate (I have seen both go wrong). _______________________________________________
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Lets not escape unicode characters unnessarily - humans read those files too.
JSON files are allowed to have most characters as unescaped utf8
-bawolff _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
For UTF-8 adding ensure_ascii = False to json.dumps would fix it. For HTML, there is no simple way as far as I know. With some searching you can find some workarounds. Or you can consider using https://github.com/simplejson/simplejson
I did point out this issue almost a week ago https://gerrit.wikimedia.org/r/#/c/119637/4/i18n/qqq.json -Niklas
wikitech-l@lists.wikimedia.org