On 03/25/2013 11:31 AM, Brion Vibber wrote:
XML doesn't so much have a data/metadata distinction so much as it has a set of attributes on every element, which makes for a more complex data structure than JSON's object graphs. This makes it harder to create a common internal->external data structure mapping that works well with *both* XML and JSON output.
That's very true. (I'll show you my scars if you show me yours!)
Even the generic identifier of each element ...
(where "generic identifier" == "class name" == "tag")
... is, in fact, an attribute value. It's the value of the nameless attribute.
Ideally, all XML attributes are metadata *about* the data content of elements, while all data content of elements is the essence and substance. But, as already noted, one person's metadata is another person's data, and the question is always, "Who decides, and on what basis, and what's the decision?"
-----
I would argue that *all* of the difficulties encountered in maintaining a *common* data structure fall into one or more of the following categories:
(1) The XML data structure not being fit for purpose.
(2) The object data structure not being fit for purpose.
(3) The two structures not being fit for the *same* purpose.
(4) The object data structure not fully reflecting the data/metadata distinction that XML requires, and that (not coincidentally) is reasonably required for the interchange of application-independent data.
Speaking as a programmer, I think #4 is the one that programmers tend to trip over. We think in terms of objects and software, rather than in terms of the information that thee objects are intended to convey, ultimately, to human beings. Our "customers" are machines, not human beings who need our data but, for any imaginable or unimaginable future reason, can't use our software.
The tags and attributes of XML -- indeed all the markup characters except what SGML calls "STAGO" (<), "ETAGO" (</), and "TAGC" (>) -- should generally be an irrelevant annoyance to anyone who is trying to get something working ASAP. It's a pity that the burden of maintaining XML falls on programmers, because they are the ones who care least about it, whose productivity suffers because of it, and whose attention to the underlying reasons for doing a good job with XML usually goes unrecognized and unrewarded. (Do I sound bitter?)
Speaking as a businessperson, before I invest in XML representations, I need to know why, because I know XML will cost real money, one way or another. In many scenarios, JSON is cheaper, and anyone who claims otherwise is ill-informed or lacks deep experience with both of them, especially in hybrid applications (Mediawiki). You guys know this; I'm preaching to the choir, here, but I want you to know that I, too, sing in your choir. Really.
Still speaking as a businessperson, customers do tend to demand XML, and at least some customers demand it for the right reasons. Some other customers demand XML for the wrong reasons, but that's OK because there are "right reasons" -- benefits to their organizations, and/or to the public -- that they, in their ignorance, don't recognize.
Still other customers demand XML for no apparent "right reason" -- perhaps out of something akin to brand loyalty. XML is simply not always the right answer. (For example, even after all these years, I'm still trying to understand why anyone would want to exchange an ODBMS, or even an RDBMS, for an "XML Database". But some do! Go figure.)
Speaking as a scholar with the motives of any data curator, I know that data objects that lack an embedded perspective on their components are extremely fragile and short-lived. Software rots, and often very quickly indeed. If I want a corpus of information to be enduringly accessible, I have to convert it to XML or SGML, and without delay.
Only supporting one or the other means we have a more consistent internal API (for the API modules to export data) and a more consistent external API (for the consumers of the API).
Very true. It's cheaper. Period. (And you get less.)
As for naming; property names in JSON objects are equivalent to element and attribute names in XML, and require human selection in either case.
Not the same. There is no distinction in JSON between what's meta and what's not. In XML, what's meta is in the markup (i.e., it's in the start-tags and end-tags), and what's not is in the content. That's the difference. Programmers *never* care about the data/metadata distinction, scholars *always* care about it, and businesspeople must do whatever the customer wants, or whatever their enterprise requires, at minimum expense. (Consultants, such as myself, get to advise all of them, which is what I'm doing right now.)
P.S. XML is pretty secure. If you use a Python interpreter to read JSON data, as many do, anything can happen. I'm not sure that's relevant to Mediawiki, but it could be relevant, particularly in a case where the data may outlive the original software. It's easy to embed a virus in a large JSON dataset. There is no such inherent risk in XML; XML is not a programming language (despite the awkward ways in which XSLT can be abused).
P.P.S. My point is: Is the focus of your product software? Or is the focus data? If it's data, then make the software conform to the requirements of the data. If it's software (e.g., the API), then you should feel quite free to make the data conform to the requirements of the software. (But I find it hard to believe that the latter case is the Mediawiki case, actually.)
Steve Newcomb