On 08/05/2016 06:46 AM, Daniel Kinzler wrote:
Am 05.08.2016 um 15:02 schrieb Peter F.
Patel-Schneider:
I side firmly with Markus here.
Consumers of data generally cannot tell whether the addition of a new field to
a data encoding is a breaking change or not.
Without additional information, they cannot know, though for "mix and match"
formats like JSON and XML, it's common practice to assume that ignoring
additions is harmless.
The assumption that ignoring additions is harmless is a very dangerous
practice, even if it is common.
In any case, we had communicated before that we do not
consider the addition of
a field a breaking change. It only becomes a breaking change when it impacts the
interpretation of other fields. In which case we would announce it well in advance.
So some additions are breaking changes then. What is a system that consumes
this information supposed to do? If the system doesn't monitor announcements
then it has to assume that any new field can be a breaking change and thus
should not accept data that has any new fields.
Given this,
code that consumes
encoded data should at least produce warnings when it encounters encodings
that it is not expecting and preferably should refuse to produce output in
such circumstances.
Depends on the circumstances. For a web browser for example, this would be very
annoying behavior. Nearly all websites would be unusable. Similarly, most email
would become unreadable if mail clients would be that strict.
I assume that you are referring to the common practice of adding extra fields
in HTTP and email transport and header structures under the assumption that
these extra fields will just be passed on to downstream systems and then
silently ignored when content is displayed. I view these as special cases
where there is at least an implicit contract that no additional field will
change the meaning of the existing fields and data. When such contracts are
in place systems can indeed expect to see additional fields, and are permitted
to ignore these extra fields.
Producers of
data thus should signal in advance any
changes to the encoding, even if they know that the changes can be safely ignored.
I disagree on "any". For example, do you want announcements about changes to
the
order of attributes in XML tags?
No.
Why?
Because XML specifically states that the order of attributes is not
significant. Therefore changes to the order of XML attributes is not changing
the encoding.
In case someone uses a regex to process
the XML? Should you not be able to rely on your clients conforming the to XML
spec, which says that the order of attributes is undefined?
Yes indeed. And there would be no problem in changing the order of entities
in the JSON dump as this order is deemed to be insignificant in well-behaved
JSON texts.
In the case at hand (adding a field), it would have
been good to communicate it
in advance. But since it wasn't tagged as "breaking", it slipped through.
We are
sorry for that. Clients should still not choke on an addition like this.
Here is where I disagree. As there is no contract that new fields in the
Wikidata JSON dumps are not breaking, clients need to treat all new fields as
potentially breaking and thus should not accept data with unknown fields.
I would view
software that consumes Wikidata information and silently ignores
fields that it is not expecting as deficient and would counsel against using
such software.
Is this just for Wikidata, or does that extend to other kinds of data too? Why,
or why not?
I say this for any data, except where there is a contract that such additional
fields are not meaning-changing.
By definition, any extensible format or protocol
(HTTP, SMTP, HTML, XML, XMPP,
IRC, etc) can contain parts (headers, elements, attributes) that the client does
not know about, and should ignore. Of course, the spec will tell clients where
to expect and allow extra bits.
Yes, these standards have explicit wording that there are certain places where
additional bits are allowed, and that these additional bits can be safely
ignored. Consumers of data in these standards can verify that the data has
not been corrupted and then safely ignore extra bits in certain places,
because they have a contract that the encoding of the data that they care
about is not affected by these extra bits. However, I don't see this contract
with respect to the Wikidata JSON encoding.
That's why I'm planning to put up a document
saying clearly what kinds of changes clients should be prepared to see in
Wikidata output:
Clients need to be prepared to encounter entity types and data types they don't
know. But they should also allow additional fields in any JSON object. We
guarantee that extra fields do not impact the interpretation of fields they know
about - unless we have announced and documented a breaking change.
Is this the contract that is going to be put forward? At some time in the not
too distant future I hope that my company will be using Wikidata information
in its products. This contract is likely to problematic for development
groups, who want some notion how long they have to prepare for changes that
can silently break their products.
Peter F. Patel-Schneider
Nuance Communications