On 03/25/2013 05:16 PM, Bartosz DziewoĆski wrote:
On Mon, 25 Mar 2013 21:23:59 +0100, Steve Newcomb srn@coolheads.com wrote:
If you use a Python interpreter to read JSON data, as many do, anything can happen. I'm not sure that's relevant to Mediawiki, but it could be relevant, particularly in a case where the data may outlive the original software. It's easy to embed a virus in a large JSON dataset. There is no such inherent risk in XML; XML is not a programming language (despite the awkward ways in which XSLT can be abused).
False. This is a feature of some parsers (and which should - and AFAIK is in Python - be disabled by default), which sadly mistake JSON for a data serialization format, when it's merely a data interchange one.
Not sure what's false about what I said. Here's what I was talking about:
#!/usr/bin/env python jsonDataSet = """{ 'this': 'hello', 'that': 'goodbye' }""" exec "myDictionary = %s" % ( jsonDataSet) ## <-- bad but real
Much can happen in an -exec-, including the definition of functions, and their assignment to "self" as methods. And recursive -exec-s, too.
Thse parsers allow certain JSON data (usually with specially formatted keys) to be parsed into arbitrary language constructs in addition to the well-known and expected arrays and maps. But again, this isn't a feature of JSON itself (if anything, it speaks of its versatility), and is as far as I can see completely irrelevant here.
The relevance depends on what people do with the data as represented. It's optimistic to expect knowledgeable, rational behaviors from human data recipients. As with any security concern:
* What's irrelevant is what we *expect* to happen.
* What's relevant is what *could* happen.
But I take your point that it's unlikely to be a problem, except for naive and/or desperate problem-solvers coping with situations and requirements we can scarcely imagine.
I was attempting to point out a different distinction from the one you draw between serialization and interchange (which is a very valid and appropriate distinction to bear in mind, here). I was saying...
JSON syntax is indistinguishable from a subset of Python syntax.
... whereas ...
XML syntax is very distinguishable from the syntax of any programming language.
XML syntax *could* be used for data serialization, but that would be directly contrary to its spirit, because much of a software implementation is implicit in its data structure (whether serialized or not), while XML is best used when the structures of its document instances are driven by, and imply, only the inherent semantics of the data. The reason is: *new* applications that will consume the data, including applications as yet unknown, won't have to compensate for the constraints and assumptions of *existing* applications of the same data.
Fact: From the perspective of an assumed set of existing popular software tools and practices, XML is significantly *less* versatile than JSON.
Fact: From the perspective of human beings, their cultures, their rapidly changing and diverse technological environments, etc., XML is significantly *more* versatile for data interchange than JSON. At a cost.
It all depends on what you're trying to accomplish. Which is more central to your mission: (1) software, or (2) data? You can't have it both ways. No one can serve two masters.
I would argue that even if you ultimately decide to forego all XML in favor of JSON, this discussion is well worth having. Things go better when everybody is on the same page.