XML doesn't so much have a data/metadata distinction so much as it has a set of attributes on every element, which makes for a more complex data structure than JSON's object graphs. This makes it harder to create a common internal->external data structure mapping that works well with *both* XML and JSON output.
Only supporting one or the other means we have a more consistent internal API (for the API modules to export data) and a more consistent external API (for the consumers of the API).
As for naming; property names in JSON objects are equivalent to element and attribute names in XML, and require human selection in either case.
-- brion
On Sun, Mar 24, 2013 at 11:54 AM, Sumana Harihareswara < sumanah@wikimedia.org> wrote:
At the Semantic MediaWiki conference (SMWCon) a few days ago, Yuri mentioned that we're considering making our web API JSON-only. In response, Steve Newcomb emailed me the message below, and gave me permission to forward it to mediawiki-api for your consideration. Thank you, Steve Newcomb.
-- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation
Dear Ms. Harihareswara,
The remarks that appear below, after my signature, are informed by participation in years of earnest presentations and discussions about XML vs. JSON at the Balisage conferences (see balisage.org).
The below remarks are extracted from the documentation of a tool we use in our consulting practice, which includes data management/publishing services for U.S. government customers. The extract is from a discussion of how the tool can optionally format XML for human-readability *without* polluting the data with spurious new whitespace. Then it digresses to more general considerations in a NOTE which is directly relevant to the JSON vs. XML question.
It all boils down to a simple question: will the data ever be used outside its current known applications and/or software? If the answer is "No", then JSON is probably the right choice. If the answer is "Yes", then XML is certainly a better choice, but then the questions arise: "Whose perspective on the data should be baked into it?", and "Who will pay the cost of baking it in?"
All best wishes for you and for humanity's ongoing invention of civilization, which depends on the longevity of knowledge,
Steve Newcomb srn@coolheads.com
In consideration of the haphazard way in which XML data are sometimes processed in the real world, one may with some justification worry about how a given XML document may someday be understood, especially when whitespace is significant. [This tool's] use of
markup characters for all readability-whitespace moots the criticism of XML that JSON is easier than XML to read and use for data interchange on account of the fact that, in JSON, all whitespace is intrinsically explicit and not subject to subsequent diddling when parsed, even when JSON data are elegantly formatted for readability.
Note: Needless to say, both syntaxes, XML and JSON, have advantages and disadvantages. In the context of this discussion, it may be worthwhile to highlight the essential difference between JSON and XML, which is that XML provides (demands, really) an explicit distinction between data and data-about-data (metadata), while JSON does not. In other words, XML requires specific classes of things to be endowed with names, while JSON imposes no such constraint. XML offers a standard way of unambiguously distinguishing the names of classes of data, and the names of attributes of those classes, from the data themselves. These names must be chosen somehow. Normally, the chosen names are meaningful. The choice of a specific name by a human being is the making of a semantic commitment. Thus, in XML, data are expressed in a way that almost inevitably reflects how someone (perhaps even the author!) thought the data should, or at least could, be understood. JSON, by contrast, does not demand that such a perspective be explicitly embedded in the data. If such a perspective is embedded in JSON data, JSON does not provide a standard way of abstracting that perspective from the data. But neither syntax prohibits the processing of data in terms of a data/metadata perspective other than the one(s) that were embedded in them. Whatever information XML can convey, JSON can also convey, and vice versa. However, if a data/metadata distinction needs to be baked into the data, such as when the data may need to be understood by a human being apart from any specific software application, XML is simpler to use, and the baked-in data/metadata distinction will be universally understandable as such, not only because of the World Wide Web Consortium XML Recommendation, but also because of ISO International Standard 8879-1986, as amended. If a baked-in data/metadata distinction is not desired, JSON is pretty clearly the better choice, but then at least two questions arise: (1) Are you certain that an embedded data/metadata distinction will be undesirable for all future applications of these data, including applications that do not yet exist? (2) Are you certain that you wish to forego your opportunity to influence how these data will be understood, including by persons as yet unborn?
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api