On 03/25/2013 05:16 PM, Bartosz DziewoĆski wrote:
On Mon, 25 Mar 2013 21:23:59 +0100, Steve Newcomb
<srn(a)coolheads.com>
wrote:
If you use a Python interpreter to read JSON
data, as many do, anything can happen. I'm not sure that's relevant to
Mediawiki, but it could be relevant, particularly in a case where the
data may outlive the original software. It's easy to embed a virus in a
large JSON dataset. There is no such inherent risk in XML; XML is not a
programming language (despite the awkward ways in which XSLT can be
abused).
False. This is a feature of some parsers (and which should - and AFAIK
is in Python - be disabled by default), which sadly mistake JSON for a
data serialization format, when it's merely a data interchange one.
Not sure what's false about what I said. Here's what I was talking about:
#!/usr/bin/env python
jsonDataSet = """{
'this': 'hello',
'that': 'goodbye'
}"""
exec "myDictionary = %s" % ( jsonDataSet) ## <-- bad but real
Much can happen in an -exec-, including the definition of functions, and
their assignment to "self" as methods. And recursive -exec-s, too.
Thse parsers allow certain JSON data (usually with
specially formatted
keys) to be parsed into arbitrary language constructs in addition to the
well-known and expected arrays and maps. But again, this isn't a feature
of JSON itself (if anything, it speaks of its versatility), and is as
far as I can see completely irrelevant here.
The relevance depends on what people do with the data as represented.
It's optimistic to expect knowledgeable, rational behaviors
from human data recipients. As with any security concern:
* What's irrelevant is what we *expect* to happen.
* What's relevant is what *could* happen.
But I take your point that it's unlikely to be a problem, except for
naive and/or desperate problem-solvers coping with situations and
requirements we can scarcely imagine.
I was attempting to point out a different distinction from the one you
draw between serialization and interchange (which is a very valid and
appropriate distinction to bear in mind, here). I was saying...
JSON syntax is indistinguishable from a subset of Python syntax.
... whereas ...
XML syntax is very distinguishable from the syntax of any programming
language.
XML syntax *could* be used for data serialization, but that would be
directly contrary to its spirit, because much of a software
implementation is implicit in its data structure (whether serialized or
not), while XML is best used when the structures of its document
instances are driven by, and imply, only the inherent semantics of the
data. The reason is: *new* applications that will consume the data,
including applications as yet unknown, won't have to compensate for the
constraints and assumptions of *existing* applications of the same data.
Fact: From the perspective of an assumed set of existing popular
software tools and practices, XML is significantly *less* versatile than
JSON.
Fact: From the perspective of human beings, their cultures, their
rapidly changing and diverse technological environments, etc., XML is
significantly *more* versatile for data interchange than JSON. At a cost.
It all depends on what you're trying to accomplish. Which is more
central to your mission: (1) software, or (2) data? You can't have it
both ways. No one can serve two masters.
I would argue that even if you ultimately decide to forego all XML in
favor of JSON, this discussion is well worth having. Things go better
when everybody is on the same page.