Re: [Wikitech-l] Live recent changes feed

10 Mar 2013


      On 03/10/2013 06:30 AM, Kevin Israel wrote:
...
On 03/10/2013 12:19 AM, Victor Vasiliev wrote:
One thing you should consider is whether to escape non-ASCII
characters (characters above U+007F) or to encode them using UTF-8.
"Whatever the JSON encoder we use does".
...
Python's json.dumps() escapes these characters by default
(ensure_ascii = True). If you don't want them escaped (as hex-encoded
UTF-16 code units), it's best to decide now, before clients with
broken UTF-8 support come into use.
As long as it does not add newlines, this is perfectly fine protocol-wise.
...
I recently made a [patch][1] (not yet merged) that would add an opt-in
"UTF8_OK" feature to FormatJson::encode(). The new option would
unescape everything above U+007F (except for U+2028 and U+2029, for
compatibility with JavaScript eval() based parsing).
The part between MediaWiki and the daemon does not matter that much
(except for hitting the size limit on packets, and even then we are on
WMF's internal network, so we should not expect any packet loss and
problems with fragmentation). The daemon extracts the wiki name from the
JSON it received, so it reencodes the change anyways in the middle.
...
...
I hope that now getting recent changes via reasonable format is a
matter of code review and deployment, and we will finally get
something reasonable to work with (with access from web
browsers!).
I don't consider encoding "撤销由158.64.77.102于2013年1月22日 (二)
16:46的版本24659468中的繁简破坏" (90 bytes using UTF-8) as
"\u64a4\u9500\u7531158.64.77.102\u4e8e2013\u5e741\u670822\u65e5
(\u4e8c)
16:46\u7684\u7248\u672c24659468\u4e2d\u7684\u7e41\u7b80\u7834\u574f"
(141 bytes)
to be reasonable at all for a brand-new protocol running over an 8-bit
clean channel.
That's your bikeshed, not mine.
-- Victor.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Live recent changes feed