On Mar 25, 2014 7:17 PM, "Brian Wolff" <bawolff(a)gmail.com> wrote:
On Mar 25, 2014 3:11 PM, "Jon Robson" <jdlrobson(a)gmail.com> wrote:
> Now MobileFrontend is using JSON for languages, I jumped on this to
> create a script to make language addition easier - basically a command
> line interface called `make message` that edits the JSONs to add an
> English message and QQQ code and maintains alphabetical ordering .
> Recently this was used and some updates came from translatewiki.net
> I ran my `make message` script and noticed it made some changes to
> those from translation updator bot .
> I was wondering - what would be the correct way to store these messages?
> Do I need to update my script or should Translator bot being doing
> things differently?
> "아라" or "\uc544\ub77c"
> "\u003Ccode\u003E" or "<code>" ?
> Thanks in advances for your opinions!
>  https://gerrit.wikimedia.org/r/#/c/119637/
>  https://gist.github.com/jdlrobson/9767604
I recentish looked at json encoding for a different project. The conclusion
there too pretty much was to never use Unicode escapes except when demanded
by the spec.
As often, non bmp stuff may be painful. Composite pairs of Unicode escapes
may be used to describe a codepoint in json. Whether you prefer to believe
that the json encoder of your consumer is less likely to choke on astral
plane characters or on encoded composite pairs is up for debate (I have
seen both go wrong). _______________________________________________
Lets not escape unicode characters unnessarily - humans read those files
JSON files are allowed to have most characters as unescaped utf8
Wikitech-l mailing list