Re: [Wikitech-l] TechCom topics 2020-11-04 (fixed)

11 Nov 2020

On Tue, Nov 10, 2020 at 5:50 PM Gergo Tisza &lt;gtisza(a)wikimedia.org&gt; wrote:

...
  On Tue, Nov 3, 2020 at 1:59 AM Daniel Kinzler
&lt;dkinzler(a)wikimedia.org&gt;
 wrote:

  TemplateData already uses JSON serialization, but
then compresses the
 JSON output, to make the data fit into the page_props table. This results
 in binary data in ParserOutput, which we can't directly put into JSON. 

 I'm not sure I understand the problem. Binary data can be trivially
 represented as JSON, by treating it as a string. Is it an issue of storage
 size? JSON escaping of the control characters is (assuming binary data with
 a somewhat random distribution of bytes) an ~50% size increase, UTF-8
 encoding the top half of bytes is another 50%, so it will approximately
 double the length - certainly worse than the ~33% increase for base64, but
 not tragic. (And if size increase matters that much, you probably shouldn't
 be using base64 either.)

The binary aspect here refers to the gzip output buffer. While these are
represented in PHP as a string, the string is not encodable as UTF-8 or
indeed as JSON. Attempting to do so results in a PHP json error with
boolean false returned.

Condensed example: https://3v4l.org/cJttU

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] TechCom topics 2020-11-04 (fixed)