Re: [Wikidata] Breaking change in JSON serialization?

5 Aug 2016

      Hi Markus!
You are asking use to better communicate changes to our serialization, even if
it's not a breaking change according to the spec. I agree we should do that. We
are trying to improve our processes to achieve this.
Can we ask you in return to try to make your software more robust, by not making
unwarranted assumptions about the serialization format?
With regards to communicating more - it's very hard to tell which changes might
break something for someone. For instance, some software might rely on the order
of fields in a JSON object, even though JSON says this is unspecified, just like
you rely on no fields being added, even though there is no guarantee about this.
Similarly, some software might rely on non-ascii characters being represented as
unicode escape sequences, and will break if we use the more compact utf-8. Or
they may break on changes whitespace. Who knows. We can not possibly know what
kind of change will break some 3rd party software.
I don't think announcing any and all changes is feasible. So I think an official
policy about what we announce can be useful. Something like "This is what we
consider a breaking change, and we will definitely announce it. And these are
some kinds of changes we will also communicate ahead of time. And these are some
things that can happen unannounced."
You are right that policies don't change the behavior of software. But perhaps
they can change the behavior of programmers, by telling them what they can (and
can't) safely rely on.
It boils down to this: we can try to be more verbose, but if you make
assumptions beyond the spec, things will break sooner or later. Writing robust
software requires more time and thought initially, but it saves a lot of
headaches later.
-- daniel
Am 04.08.2016 um 21:49 schrieb Markus Kroetzsch:
...
Daniel,
You present arguments on issues that I would never even bring up. I think we
fully agree on many things here. Main points of misunderstanding:

I was not talking about the WMDE definition of "breaking change". I just meant

"a change that breaks things". You can define this term for yourself as you like
and I won't argue with this.

I would never say that it is "right" that things break in this case. It's

annoying. However, it is the standard behaviour of widely used JSON parsing
libraries. We won't discuss it away.

I am not arguing that the change as such is bad. I just need to know about it

to fix things before they break.

I am fully aware of many places where my software should be improved, but I

cannot fix all of them just to be prepared if a change should eventually happen
(if it ever happens). I need to know about the next thing that breaks so I can
prioritize this.

The best way to fix this problem is to annotate all Jackson classes with the

respective switch individually. The global approach you linked to requires that
all users of the classes implement the fix, which is not working in a library.

When I asked for announcements, I did not mean an information of the type "we

plan to add more optional bits soonish". This ancient wiki page of yours that
mentions that some kind of change should happen at some point is even more
vague. It is more helpful to learn about changes when you know how they will
look and when they will happen. My assumption is that this is a "low cost"
improvement that is not too much to ask for.

I did not follow what you want to make an "official policy" for. Software

won't behave any differently just because there is a policy saying that it should.
Markus
On 04.08.2016 16:48, Daniel Kinzler wrote:
...
Hi Markus!
I would like to elaborate a little on what Lydia said.
Am 04.08.2016 um 09:27 schrieb Markus Kroetzsch:
...
It seems that some changes have been made to the JSON serialization recently:
https://github.com/Wikidata/Wikidata-Toolkit/issues/237
This specific change has been announced in our JSON spec for as long as the
document exists.
https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#wikibase-entityid sais:
...
WARNING: wikibase-entityid may in the future change to be represented as a
single string literal, or may even be dropped in favor of using the string
value type to reference entities.
NOTE: There is currently no reliable mechanism for clients to generate a
prefixed ID or a URL from the information in the data value.
That was the problem: With the current format, all clients needed a hard coded
mapping of entity types to prefixes, in order to construct ID strings from the
JSON serialization of ID values. That means no entity types can be added without
breaking clients. This has now been fixed.
Of course, it would have been good to announce this in advance. However, it is
not a breaking change, and we do not plan to treat additions as breaking changes.
Adding something to a public interface is not a breaking change. Adding a method
to an API isn't, adding an element to XML isn't, and adding a key to JSON isn't

unless there is a spec that explicitly states otherwise.

These are "mix and match" formats, in which anything that isn't forbidden is
allowed. It's the responsibility of the client to accommodate such changes. This
is simple best practice - a HTTP client shouldn't choke on header fields it
doesn't know, etc. See https://en.wikipedia.org/wiki/Robustness_principle.
If you use a library that is touchy about extra data per default, configure it
to be more accommodating, see for instance
https://stackoverflow.com/questions/14343477/how-do-you-globally-set-jackson-to-ignore-unknown-properties-within-spring.
...
Could somebody from the dev team please comment on this? Is this going to be in
the dumps as well or just in the API?
Yes, we use the same basic serialization for the API and the dumps. For the
future, note that some parts (such as sitelink URLs) are optional, and we plan
to add more optional bits (such as normalized quantities) soonish.
...
Are further changes coming up?
Yes. The next one in the pipeline is Quantities without upperBound and
lowerBound, see https://phabricator.wikimedia.org/T115270. That IS a breaking
change, and the implementation is thus blocked on announcing it, see
https://gerrit.wikimedia.org/r/#/c/302248/.
Furthermore, we will probably remove the entity-type and numeric-id fields from
the serialization of EntityIdValues eventually. But there is no concrete plan
for that at the moment.
When we remove the old fields for ItemId and PropertyId, that IS a breaking
change, and will be announced as such.
...
Are we ever
going to get email notifications of API changes implemented by the team rather
than having to fix the damage after they happened?
We aspire to communicate early, and we are sorry we did not announce this change
ahead of time.
However, this is not a breaking change by the common understanding of the term,
and will not be treated as such. We have argued about that on this list before,
see
https://www.mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00902.html.
I have made it clear back then what we consider a breaking change and what not,
and I have advised you that being accommodating in what your client code accepts
will avoid headaches in the future.
To make this even more clear, we will enact and document something similar to my
email from February as official policy soon. Watch for an announcement on this
list.

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Breaking change in JSON serialization?