On 08/05/2016 06:46 AM, Daniel Kinzler wrote:
Am 05.08.2016 um 15:02 schrieb Peter F. Patel-Schneider:
I side firmly with Markus here.
Consumers of data generally cannot tell whether the addition of a new field to a data encoding is a breaking change or not.
Without additional information, they cannot know, though for "mix and match" formats like JSON and XML, it's common practice to assume that ignoring additions is harmless.
The assumption that ignoring additions is harmless is a very dangerous practice, even if it is common.
In any case, we had communicated before that we do not consider the addition of a field a breaking change. It only becomes a breaking change when it impacts the interpretation of other fields. In which case we would announce it well in advance.
So some additions are breaking changes then. What is a system that consumes this information supposed to do? If the system doesn't monitor announcements then it has to assume that any new field can be a breaking change and thus should not accept data that has any new fields.
Given this, code that consumes encoded data should at least produce warnings when it encounters encodings that it is not expecting and preferably should refuse to produce output in such circumstances.
Depends on the circumstances. For a web browser for example, this would be very annoying behavior. Nearly all websites would be unusable. Similarly, most email would become unreadable if mail clients would be that strict.
I assume that you are referring to the common practice of adding extra fields in HTTP and email transport and header structures under the assumption that these extra fields will just be passed on to downstream systems and then silently ignored when content is displayed. I view these as special cases where there is at least an implicit contract that no additional field will change the meaning of the existing fields and data. When such contracts are in place systems can indeed expect to see additional fields, and are permitted to ignore these extra fields.
Producers of data thus should signal in advance any changes to the encoding, even if they know that the changes can be safely ignored.
I disagree on "any". For example, do you want announcements about changes to the order of attributes in XML tags?
No.
Why?
Because XML specifically states that the order of attributes is not significant. Therefore changes to the order of XML attributes is not changing the encoding.
In case someone uses a regex to process the XML? Should you not be able to rely on your clients conforming the to XML spec, which says that the order of attributes is undefined?
Yes indeed. And there would be no problem in changing the order of entities in the JSON dump as this order is deemed to be insignificant in well-behaved JSON texts.
In the case at hand (adding a field), it would have been good to communicate it in advance. But since it wasn't tagged as "breaking", it slipped through. We are sorry for that. Clients should still not choke on an addition like this.
Here is where I disagree. As there is no contract that new fields in the Wikidata JSON dumps are not breaking, clients need to treat all new fields as potentially breaking and thus should not accept data with unknown fields.
I would view software that consumes Wikidata information and silently ignores fields that it is not expecting as deficient and would counsel against using such software.
Is this just for Wikidata, or does that extend to other kinds of data too? Why, or why not?
I say this for any data, except where there is a contract that such additional fields are not meaning-changing.
By definition, any extensible format or protocol (HTTP, SMTP, HTML, XML, XMPP, IRC, etc) can contain parts (headers, elements, attributes) that the client does not know about, and should ignore. Of course, the spec will tell clients where to expect and allow extra bits.
Yes, these standards have explicit wording that there are certain places where additional bits are allowed, and that these additional bits can be safely ignored. Consumers of data in these standards can verify that the data has not been corrupted and then safely ignore extra bits in certain places, because they have a contract that the encoding of the data that they care about is not affected by these extra bits. However, I don't see this contract with respect to the Wikidata JSON encoding.
That's why I'm planning to put up a document saying clearly what kinds of changes clients should be prepared to see in Wikidata output:
Clients need to be prepared to encounter entity types and data types they don't know. But they should also allow additional fields in any JSON object. We guarantee that extra fields do not impact the interpretation of fields they know about - unless we have announced and documented a breaking change.
Is this the contract that is going to be put forward? At some time in the not too distant future I hope that my company will be using Wikidata information in its products. This contract is likely to problematic for development groups, who want some notion how long they have to prepare for changes that can silently break their products.
Peter F. Patel-Schneider Nuance Communications