Hi Markus!
You are asking use to better communicate changes to our serialization, even if it's not a breaking change according to the spec. I agree we should do that. We are trying to improve our processes to achieve this.
Can we ask you in return to try to make your software more robust, by not making unwarranted assumptions about the serialization format?
With regards to communicating more - it's very hard to tell which changes might break something for someone. For instance, some software might rely on the order of fields in a JSON object, even though JSON says this is unspecified, just like you rely on no fields being added, even though there is no guarantee about this. Similarly, some software might rely on non-ascii characters being represented as unicode escape sequences, and will break if we use the more compact utf-8. Or they may break on changes whitespace. Who knows. We can not possibly know what kind of change will break some 3rd party software.
I don't think announcing any and all changes is feasible. So I think an official policy about what we announce can be useful. Something like "This is what we consider a breaking change, and we will definitely announce it. And these are some kinds of changes we will also communicate ahead of time. And these are some things that can happen unannounced."
You are right that policies don't change the behavior of software. But perhaps they can change the behavior of programmers, by telling them what they can (and can't) safely rely on.
It boils down to this: we can try to be more verbose, but if you make assumptions beyond the spec, things will break sooner or later. Writing robust software requires more time and thought initially, but it saves a lot of headaches later.
-- daniel
Am 04.08.2016 um 21:49 schrieb Markus Kroetzsch:
Daniel,
You present arguments on issues that I would never even bring up. I think we fully agree on many things here. Main points of misunderstanding:
- I was not talking about the WMDE definition of "breaking change". I just meant
"a change that breaks things". You can define this term for yourself as you like and I won't argue with this.
- I would never say that it is "right" that things break in this case. It's
annoying. However, it is the standard behaviour of widely used JSON parsing libraries. We won't discuss it away.
- I am not arguing that the change as such is bad. I just need to know about it
to fix things before they break.
- I am fully aware of many places where my software should be improved, but I
cannot fix all of them just to be prepared if a change should eventually happen (if it ever happens). I need to know about the next thing that breaks so I can prioritize this.
- The best way to fix this problem is to annotate all Jackson classes with the
respective switch individually. The global approach you linked to requires that all users of the classes implement the fix, which is not working in a library.
- When I asked for announcements, I did not mean an information of the type "we
plan to add more optional bits soonish". This ancient wiki page of yours that mentions that some kind of change should happen at some point is even more vague. It is more helpful to learn about changes when you know how they will look and when they will happen. My assumption is that this is a "low cost" improvement that is not too much to ask for.
- I did not follow what you want to make an "official policy" for. Software
won't behave any differently just because there is a policy saying that it should.
Markus
On 04.08.2016 16:48, Daniel Kinzler wrote:
Hi Markus!
I would like to elaborate a little on what Lydia said.
Am 04.08.2016 um 09:27 schrieb Markus Kroetzsch:
It seems that some changes have been made to the JSON serialization recently:
This specific change has been announced in our JSON spec for as long as the document exists. https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#wikibase-entityid sais:
WARNING: wikibase-entityid may in the future change to be represented as a single string literal, or may even be dropped in favor of using the string value type to reference entities.
NOTE: There is currently no reliable mechanism for clients to generate a prefixed ID or a URL from the information in the data value.
That was the problem: With the current format, all clients needed a hard coded mapping of entity types to prefixes, in order to construct ID strings from the JSON serialization of ID values. That means no entity types can be added without breaking clients. This has now been fixed.
Of course, it would have been good to announce this in advance. However, it is not a breaking change, and we do not plan to treat additions as breaking changes.
Adding something to a public interface is not a breaking change. Adding a method to an API isn't, adding an element to XML isn't, and adding a key to JSON isn't
- unless there is a spec that explicitly states otherwise.
These are "mix and match" formats, in which anything that isn't forbidden is allowed. It's the responsibility of the client to accommodate such changes. This is simple best practice - a HTTP client shouldn't choke on header fields it doesn't know, etc. See https://en.wikipedia.org/wiki/Robustness_principle.
If you use a library that is touchy about extra data per default, configure it to be more accommodating, see for instance https://stackoverflow.com/questions/14343477/how-do-you-globally-set-jackson-to-ignore-unknown-properties-within-spring.
Could somebody from the dev team please comment on this? Is this going to be in the dumps as well or just in the API?
Yes, we use the same basic serialization for the API and the dumps. For the future, note that some parts (such as sitelink URLs) are optional, and we plan to add more optional bits (such as normalized quantities) soonish.
Are further changes coming up?
Yes. The next one in the pipeline is Quantities without upperBound and lowerBound, see https://phabricator.wikimedia.org/T115270. That IS a breaking change, and the implementation is thus blocked on announcing it, see https://gerrit.wikimedia.org/r/#/c/302248/.
Furthermore, we will probably remove the entity-type and numeric-id fields from the serialization of EntityIdValues eventually. But there is no concrete plan for that at the moment.
When we remove the old fields for ItemId and PropertyId, that IS a breaking change, and will be announced as such.
Are we ever going to get email notifications of API changes implemented by the team rather than having to fix the damage after they happened?
We aspire to communicate early, and we are sorry we did not announce this change ahead of time.
However, this is not a breaking change by the common understanding of the term, and will not be treated as such. We have argued about that on this list before, see https://www.mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00902.html. I have made it clear back then what we consider a breaking change and what not, and I have advised you that being accommodating in what your client code accepts will avoid headaches in the future.
To make this even more clear, we will enact and document something similar to my email from February as official policy soon. Watch for an announcement on this list.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata