Dear all,
There have been some interesting discussions about breaking changes
here, but before we continue in this direction, let me repeat that I did
not start this thread to define what is a "breaking change" in JSON.
There are JSON libraries that define this in a strict way (siding with
Peter) and browsers that are more tolerant (siding with Daniel). I don't
think we can come to definite conclusions here. Format versioning, as
Stas suggests, can't be a bad thing.
However, all I was asking for was to get a little email when JSON is
changed. It is not necessary to discuss if this is really necessary due
to some higher principle. Even if my software tolerates the change, I
should *always* know about new information being available. It is
usually there for a purpose, so my software should do better than "not
breaking".
Lydia has already confirmed early on that suitable notification emails
should be sent in the future, so I don't see a need to continue this
particular discussion. Daniel's position seemed to be a mix of "I told
you so" and "you volunteers should write better code", which is of
little help to me or my users. It would be good to rethink how to
approach the community in such cases, to make sure that a coherent and
welcoming message is sent to contributors. (On that note, all the best
on your new job, Léa! -- communicating with this crowd can be a
challenge at times ;-).
Markus
On 11.08.2016 22:35, Stas Malyshev wrote:
Hi!
My view is that this tool should be extremely
cautious when it sees new data
structures or fields. The tool should certainly not continue to output
facts without some indication that something is suspect, and preferably
should refuse to produce output under these circumstances.
I don't think I agree. I find tools that are too picky about details
that are not important to me hard to use, and I'd very much prefer a
tool where I am in control of which information I need and which I don't
need.
What can happen if the tool instead continues to
operate without complaint
when new data structures are seen? Consider what would happen if the tool
was written for a version of Wikidata that didn't have rank, i.e., claim
objects did not have a rank name/value pair. If ranks were then added,
consumers of the output of the tool would have no way of distinguishing
deprecated information from other information.
Ranks are a bit unusual because ranks are not just informational change,
it's a semantic change. It introduces a concept of a statement that has
different semantics than the rest. Of course, such change needs to be
communicated - it's like I would make format change "each string
beginning with letter X needs to be read backwards" but didn't tell the
clients. Of course this is a breaking change if it changes semantics.
What I was talking are changes that don't break semantics, and majority
of additions are just that.
Of course this is an extreme case. Most changes
to the Wikidata JSON dump
format will not cause such severe problems. However, given the current
situation with how the Wikidata JSON dump format can change, the tool cannot
determine whether any particular change will affect the meaning of what it
produces. Under these circumstances it is dangerous for a tool that
extracts information from the Wikidata JSON dump to continue to produce
output when it sees new data structures.
The tool can not. It's not possible to write a tool that would derive
semantics just from JSON dump, or even detect semantic changes. Semantic
changes can be anywhere, it doesn't have to be additional field - it can
be in the form of changing the meaning of the field, or format, or
datatype, etc. Of course the tool can not know that - people should know
that and communicate it. Again, that's why I think we need to
distinguish changes that break semantics and changes that don't, and
make the tools robust against the latter - but not the former because
it's impossible. For dealing with the former, there is a known and
widely used solution - format versioning.
This does make consuming tools sensitive to
changes to the Wikidata JSON
dump format that are "non-breaking". To overcome this problem there should
be a way for tools to distinguish changes to the Wikidata JSON dump format
that do not change the meaning of existing constructs in the dump from those
that can. Consuming tools can then continue to function without problems
for the former kind of change.
As I said, format versioning. Maybe even semver or some suitable
modification of it. RDF exports BTW already carry version. Maybe JSON
exports should too.