Orders in JSON

List overview All Threads
Download

newer

older

reviews needed for upcoming query...

Resolving Redirects

Markus Krötzsch

23 Jun 2014 23 Jun '14

2:39 p.m.

Hi,

Two quick questions about orderings of stuff in JSON. Currently, we have two order-related keys:

snaks-order qualifiers-order

They are used to specify the order of groups of snaks in references (snaks-order) and statements (qualifiers-order). This is needed since the snak groups are stored in JSON maps in both cases (property => snak list), and maps do not have order semantics.

Question 1: Why don't we also have some information about statement/claim order? This seems to be necessary for using the API JSON internally as planned.

Question 2: Wouldn't it be more convenient to store lists of things in all cases, and have the "map" version just as an optional API switch for users who don't care about order (it could remain the default)? This would help to retrieve order information more easily.

Cheers,

Markus

Show replies by date

Jeroen De Dauw

23 Jun 23 Jun

5:44 p.m.

Hey,

...

Question 1: Why don't we also have some information about statement/claim

order? This seems to be necessary for using the API JSON internally as planned.

To answer the first part: probably because no one got to that yet. As for the second part: how so? The internal format also does not have this.

...

Question 2: Wouldn't it be more convenient to store lists of things in

all cases, and have the "map" version just as an optional API switch for users who don't care about order (it could remain the default)? This would help to retrieve order information more easily.

That would better serve the WDTK use case, and those who do a full deserialization. I strongly suspect most users of the JSON do not fall into that category.

Cheers

-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3

Markus Krötzsch

7:28 p.m.

On 23/06/14 19:44, Jeroen De Dauw wrote:

...

Hey,

...
Question 1: Why don't we also have some information about

statement/claim order? This seems to be necessary for using the API JSON internally as planned.

To answer the first part: probably because no one got to that yet. As for the second part: how so? The internal format also does not have this.

I would have thought that the answer to your second point is that Wikibase also needs to preserve the order, and JSON parsers do not guarantee this for maps. Maybe I am miunderstanding this, but how can the client-side Javascript build the statements in the correct order if it just gets a map? Is there some built-in order in JSON maps after all? Then we could just drop the other order fields too.

...

...
Question 2: Wouldn't it be more convenient to store lists of things

in all cases, and have the "map" version just as an optional API switch for users who don't care about order (it could remain the default)? This would help to retrieve order information more easily.

That would better serve the WDTK use case, and those who do a full deserialization. I strongly suspect most users of the JSON do not fall into that category.

My suggestion serves all users, since it gives you the old and the new behaviour. Anyway, the requirements of WDTK are not any stronger than the requirements of Wikibase itself -- information not available to Wikibase won't need to be available to WDTK either.

Cheers,

Markus

24 Jun 24 Jun

7:14 a.m.

...

Question 2: Wouldn't it be more convenient to store lists of things in all cases, and have the "map" version just as an optional API switch for users who don't care about order (it could remain the default)? This would help to retrieve order information more easily.

Strong support! As I see it, the mappings are causing real pain since these simply do not represent that things are, in fact, stored in order. The maps generate a wrong representation which, on the one hand, may be more convenient at first glance but, on the other hand, is confusing as soon as someone digs deeper into the API because these "snaks-order" and "qualifiers-order" hack which got implemented when everybody realized that we are missing order adds to complexity big time. Personally, I would even drop the mappings completely. Yes, the maps allow pretty fast access to values (in particular to users inexperienced in dealing with APIs) but be sure that people using the API will get along fine without them and Wikibase would have a single, clean and more consistent interface. Do not fear the change. :) I shall reiterate: STRONG SUPPORT!

Daniel Kinzler

7:32 a.m.

Hearing these arguments while we are in the process of consolidating the internal and external representations makes me feel like we actually do want different serializations: one that contains all the info, including order, and one that is convenient to use for the most common use cases.

I suspect that the vast majority of API users does not care about order. I also like the option to access things by their ID directly, without iterating over everything. I'd hate to give that up.

A serialization option would be a possibility (like we also have groups/ungrouped mode) - we in fact already have such a mode, it's used for generating XML output from the API; the XML serialization doesn't like IDs being used as keys, it wants lists. We'd just need to expose that setting.

But then we are back to having different serialization formats for the API and internal storage/dumps. The internal format would be much saner than it is now, and much more similar to the API format, but it would still be different.

-- daniel

Am 24.06.2014 09:14, schrieb h:

...

...
Question 2: Wouldn't it be more convenient to store lists of things in all cases, and have the "map" version just as an optional API switch for users who don't care about order (it could remain the default)? This would help to retrieve order information more easily.

Strong support! As I see it, the mappings are causing real pain since these simply do not represent that things are, in fact, stored in order. The maps generate a wrong representation which, on the one hand, may be more convenient at first glance but, on the other hand, is confusing as soon as someone digs deeper into the API because these "snaks-order" and "qualifiers-order" hack which got implemented when everybody realized that we are missing order adds to complexity big time. Personally, I would even drop the mappings completely. Yes, the maps allow pretty fast access to values (in particular to users inexperienced in dealing with APIs) but be sure that people using the API will get along fine without them and Wikibase would have a single, clean and more consistent interface. Do not fear the change. :) I shall reiterate: STRONG SUPPORT!

Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

-- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Fredo Erxleben

8:34 a.m.

On 24.06.2014 09:32, Daniel Kinzler wrote:

...

Hearing these arguments while we are in the process of consolidating the internal and external representations makes me feel like we actually do want different serializations: one that contains all the info, including order, and one that is convenient to use for the most common use cases.

I suspect that the vast majority of API users does not care about order. I also like the option to access things by their ID directly, without iterating over everything. I'd hate to give that up.

A serialization option would be a possibility (like we also have groups/ungrouped mode) - we in fact already have such a mode, it's used for generating XML output from the API; the XML serialization doesn't like IDs being used as keys, it wants lists. We'd just need to expose that setting.

But then we are back to having different serialization formats for the API and internal storage/dumps. The internal format would be much saner than it is now, and much more similar to the API format, but it would still be different.

-- daniel

Am 24.06.2014 09:14, schrieb h:

...
...
Question 2: Wouldn't it be more convenient to store lists of things in all cases, and have the "map" version just as an optional API switch for users who don't care about order (it could remain the default)? This would help to retrieve order information more easily.

Strong support! As I see it, the mappings are causing real pain since these simply do not represent that things are, in fact, stored in order. The maps generate a wrong representation which, on the one hand, may be more convenient at first glance but, on the other hand, is confusing as soon as someone digs deeper into the API because these "snaks-order" and "qualifiers-order" hack which got implemented when everybody realized that we are missing order adds to complexity big time. Personally, I would even drop the mappings completely. Yes, the maps allow pretty fast access to values (in particular to users inexperienced in dealing with APIs) but be sure that people using the API will get along fine without them and Wikibase would have a single, clean and more consistent interface. Do not fear the change. :) I shall reiterate: STRONG SUPPORT!

Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

How about if one just keeps the maps but introduces an "order"-field into the maps values instead of taping the orders in a seperate list behind the map? It serves both views and restoring order would be easy.

Adrian Lang

8:37 a.m.

I support dropping maps (at least in the default format) in favor of arrays where things actually are sorted lists.

On Tue, Jun 24, 2014 at 9:32 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:

...

Hearing these arguments while we are in the process of consolidating the internal and external representations makes me feel like we actually do want different serializations: one that contains all the info, including order, and one that is convenient to use for the most common use cases.

I suspect that the vast majority of API users does not care about order. I also like the option to access things by their ID directly, without iterating over everything. I'd hate to give that up.

A serialization option would be a possibility (like we also have groups/ungrouped mode) - we in fact already have such a mode, it's used for generating XML output from the API; the XML serialization doesn't like IDs being used as keys, it wants lists. We'd just need to expose that setting.

But then we are back to having different serialization formats for the API and internal storage/dumps. The internal format would be much saner than it is now, and much more similar to the API format, but it would still be different.

-- daniel

Am 24.06.2014 09:14, schrieb h:

...
...
Question 2: Wouldn't it be more convenient to store lists of things in all cases, and have the "map" version just as an optional API switch for users who don't care about order (it could remain the default)? This would help to retrieve order information more easily.

Strong support! As I see it, the mappings are causing real pain since these simply do not represent that things are, in fact, stored in order. The maps generate a wrong representation which, on the one hand, may be more convenient at first glance but, on the other hand, is confusing as soon as someone digs deeper into the API because these "snaks-order" and "qualifiers-order" hack which got implemented when everybody realized that we are missing order adds to complexity big time. Personally, I would even drop the mappings completely. Yes, the maps allow pretty fast access to values (in particular to users inexperienced in dealing with APIs) but be sure that people using the API will get along fine without them and Wikibase would have a single, clean and more consistent interface. Do not fear the change. :) I shall reiterate: STRONG SUPPORT!

Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

-- Daniel Kinzler Senior Software Developer

Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Thiemo Mättig

9:03 a.m.

...

the mappings are causing real pain

How and where? Can you please give examples and link to them?

...

these "snaks-order" and "qualifiers-order" hack [...] adds to complexity big time.

I'm sorry but this is simply not true. All you need is a single additional line of code.

With the proposed lists:

foreach ( snak in snaks ) { ... }

With the current mappings:

foreach ( id in snaks-order ) { snak = snaks[id] ... }

The current approach gives everybody the best from both worlds without the need to implement, test, maintain, support and bugfix two different formats.

I really wonder why we are discussing this. Markus, just add similar ...-order arrays to places that miss them at the moment. Done.

Best Thiemo

-- Thiemo Mättig Software-Entwickler Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin Tel. (030) 219 158 26-0 http://wikimedia.de Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen Wissens frei teilhaben kann. Helfen Sie uns dabei! http://spenden.wikimedia.de/ Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Fredo Erxleben

9:27 a.m.

On 24.06.2014 11:03, Thiemo Mättig wrote:

...

I'm sorry but this is simply not true. All you need is a single additional line of code.

With the proposed lists:

foreach ( snak in snaks ) { ... }

With the current mappings:

foreach ( id in snaks-order ) { snak = snaks[id] ... }

The current approach gives everybody the best from both worlds without the need to implement, test, maintain, support and bugfix two different formats.

I really wonder why we are discussing this. Markus, just add similar ...-order arrays to places that miss them at the moment. Done.

As the one who is doing the JSON for the wdtk at the moment:

If I understand you correctly, you want me to keep an additional list for the order and the map itself. This will create a nice memory overhead if you want to process all the items.

The Json parser we intend to employ works by structural matching, so one will have a good time explaining him how to match list+map onto a list. This would require some kind of Json post-processing wich in turn would slow us down. This is the exact opposite of what we want to achieve by using a more advanced Json-parsing solution.

I could in also argument that we keep the list and you can recreate a map from this list if you don't need ordering but want faster access.

We gain nothing by bashing use cases, they are diverse and all equally valid.

...

Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Markus Krötzsch

1:43 p.m.

I've seen three formats proposed so far:

(1) Map + order fields (current format) (2) Arrays (3) Map + sort-index inside each map item

The last was proposed by Fredo; I think it got lost a bit. The idea there would be to store something like "index: 1" in the objects that are inside the map to define their order. Advantage: order information in the same subtree of the JSON (works with parsers that parallelize subtree parsing in maps). Disadvantage: objects get a new index field that is not really part of the object data but part of the context in which the object is used. Also, it still is some work to build the real list.

On 24/06/14 11:27, Fredo Erxleben wrote: ...

...

We gain nothing by bashing use cases, they are diverse and all equally valid.

Agreed. I can see use cases for any of the formats. WDTK is less of an issue here than direct API users (bots and scripts). A script that wants to read statements will probably prefer a map; a bot that wants to change the order of properties in references would probably prefer arrays. I don't think we should argue whether one is more important than the other.

From the current discussion, I would prefer the Arrays for the "internal" "main" format. Conceptually, we have a list of things, so using an Array would harmonize the JSON with the data. Daniel suggested to have a switch in the API to select format (1) optionally. I also think that something like this is needed if there would be such a change in the JSON (already for b/c with existing applications).

I don't think performance is a big thing to discuss here. Any of the options we discuss will be fast enough for most applications. I would focus on the design, which needs to be fixed one way or the other (since order data for statements is missing completely).

Cheers,

Markus

Fredo Erxleben

6:01 p.m.

...

From the current discussion, I would prefer the Arrays for the "internal" "main" format. Conceptually, we have a list of things, so using an Array would harmonize the JSON with the data. Daniel suggested to have a switch in the API to select format (1) optionally. I also think that something like this is needed if there would be such a change in the JSON (already for b/c with existing applications).

Then we have two formats again. Only a bit less different. There must be more possibilities to solve this for everyone. Is there a way we can fix the order implicitly? Or can we assure that the map is always written into Json in the correct order? Maybe use a combination of Property and index as key?

Thomas Douillard

8:22 p.m.

Then we have two formats again. Only a bit less different. There must be more possibilities to solve this for everyone.

...

Is there a way we can fix the order implicitly? Or can we assure that the map is always written into Json in the correct order? Maybe use a combination of Property and index as key?

There is not really a lot of possible or plausible implementations. But the real question is : is this worth the headeach ? Premature optimisation is the root of all evils.

As a user, I would prefer you to focus on problems that really have an impact than to whatever question of having only one serialisation format, which is really a stractch in the big picture.

3824

Age (days ago)

3825

Last active (days ago)

wikidata-tech@lists.wikimedia.org

11 comments

8 participants

tags (0)

participants (8)

Adrian Lang
Daniel Kinzler
Fredo Erxleben
h
Jeroen De Dauw
Markus Krötzsch
Thiemo Mättig
Thomas Douillard