Re: [Mediawiki-api] Fwd: a comment w/r/t JSON vs. XML

25 Mar 2013

On 03/25/2013 11:31 AM, Brion Vibber wrote:
...
  XML doesn't so much have a data/metadata
distinction so much as it has a
 set of attributes on every element, which makes for a more complex data
 structure than JSON's object graphs. This makes it harder to create a
 common internal->external data structure mapping that works well with
 *both* XML and JSON output. 
That's very true.  (I'll show you my scars if you show me yours!)

Even the generic identifier of each element ...

(where "generic identifier" == "class name" == "tag")

... is, in fact, an attribute value.  It's the value of the nameless
attribute.

Ideally, all XML attributes are metadata *about* the data content of
elements, while all data content of elements is the essence and
substance.  But, as already noted, one person's metadata is another
person's data, and the question is always, "Who decides, and on what
basis, and what's the decision?"

-----

I would argue that *all* of the difficulties encountered in maintaining
a *common* data structure fall into one or more of the following categories:

(1) The XML data structure not being fit for purpose.

(2) The object data structure not being fit for purpose.

(3) The two structures not being fit for the *same* purpose.

(4) The object data structure not fully reflecting the data/metadata
distinction that XML requires, and that (not coincidentally) is
reasonably required for the interchange of application-independent data.

Speaking as a programmer, I think #4 is the one that programmers tend to
trip over.  We think in terms of objects and software, rather than in
terms of the information that thee objects are intended to convey,
ultimately, to human beings.  Our "customers" are machines, not human
beings who need our data but, for any imaginable or unimaginable future
reason, can't use our software.

The tags and attributes of XML -- indeed all the markup characters
except what SGML calls "STAGO" (<), "ETAGO" (</), and
"TAGC" (>) --
should generally be an irrelevant annoyance to anyone who is trying to
get something working ASAP.  It's a pity that the burden of maintaining
XML falls on programmers, because they are the ones who care least about
it, whose productivity suffers because of it, and whose attention to the
underlying reasons for doing a good job with XML usually goes
unrecognized and unrewarded.  (Do I sound bitter?)

Speaking as a businessperson, before I invest in XML representations, I
need to know why, because I know XML will cost real money, one way or
another.  In many scenarios, JSON is cheaper, and anyone who claims
otherwise is ill-informed or lacks deep experience with both of them,
especially in hybrid applications (Mediawiki).  You guys know this; I'm
preaching to the choir, here, but I want you to know that I, too, sing
in your choir.  Really.

Still speaking as a businessperson, customers do tend to demand XML, and
at least some customers demand it for the right reasons.  Some other
customers demand XML for the wrong reasons, but that's OK because there
are "right reasons" -- benefits to their organizations, and/or to the
public -- that they, in their ignorance, don't recognize.

Still other customers demand XML for no apparent "right reason" --
perhaps out of something akin to brand loyalty.  XML is simply not
always the right answer.  (For example, even after all these years, I'm
still trying to understand why anyone would want to exchange an ODBMS,
or even an RDBMS, for an "XML Database".  But some do!  Go figure.)

Speaking as a scholar with the motives of any data curator, I know that
data objects that lack an embedded perspective on their components are
extremely fragile and short-lived.  Software rots, and often very
quickly indeed.  If I want a corpus of information to be enduringly
accessible, I have to convert it to XML or SGML, and without delay.

...
  Only supporting one or the other means we have a more
consistent
 internal API (for the API modules to export data) and a more consistent
 external API (for the consumers of the API). 
Very true.  It's cheaper.  Period.  (And you get less.)

...
  As for naming; property names in JSON objects are
equivalent to element
 and attribute names in XML, and require human selection in either case. 
Not the same.  There is no distinction in JSON between what's meta and
what's not.  In XML, what's meta is in the markup (i.e., it's in the
start-tags and end-tags), and what's not is in the content.  That's the
difference.  Programmers *never* care about the data/metadata
distinction, scholars *always* care about it, and businesspeople must do
whatever the customer wants, or whatever their enterprise requires, at
minimum expense.  (Consultants, such as myself, get to advise all of
them, which is what I'm doing right now.)

P.S. XML is pretty secure.  If you use a Python interpreter to read JSON
data, as many do, anything can happen.  I'm not sure that's relevant to
Mediawiki, but it could be relevant, particularly in a case where the
data may outlive the original software.  It's easy to embed a virus in a
large JSON dataset.  There is no such inherent risk in XML; XML is not a
programming language (despite the awkward ways in which XSLT can be abused).

P.P.S. My point is: Is the focus of your product software?  Or is the
focus data?  If it's data, then make the software conform to the
requirements of the data.  If it's software (e.g., the API), then you
should feel quite free to make the data conform to the requirements of
the software.  (But I find it hard to believe that the latter case is
the Mediawiki case, actually.)

Steve Newcomb

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Mediawiki-api] Fwd: a comment w/r/t JSON vs. XML