Hello,
I found, when looking at the external Json, that it uses different notations for pretty much the same thing. If we talk about item ids on the document level the Json hast the form:
"id":"Q1", "type":"item"
Once you talk about item ids on the level of a snaks value it is:
"entity-type":"item", "numeric-id":"1"
Is there any deeper meaning that prevents, using only one of these forms? (Also notice, that "entity-type" is redundant, since it is already given by a snaks "datatype":"wikibase-item")
Sincerly, Fredo Erxleben
Hey,
That's a good question.
The reason we have
"entity-type":"item",
"numeric-id":"1"
is "for legacy reasons". This actually makes little sense any more, and is problematic in that it assumes all ids have a numeric part. That assumption will likely break in the near future when we add Commons support. It'd much rather have the json contain simply string "Q1" than an array with the above elements. The reason why this has not happened yet is that it is a big breaking change to users of our API. Due to Commons it looks like we will soon be forced to make such a change though. (Such a change will be announced well in advance of being deployed of course.)
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
On 16.06.2014 20:28, Jeroen De Dauw wrote:
Hey,
That's a good question.
The reason we have
"entity-type":"item",
"numeric-id":"1"
is "for legacy reasons". This actually makes little sense any more, and is problematic in that it assumes all ids have a numeric part. That assumption will likely break in the near future when we add Commons support. It'd much rather have the json contain simply string "Q1" than an array with the above elements. The reason why this has not happened yet is that it is a big breaking change to users of our API. Due to Commons it looks like we will soon be forced to make such a change though. (Such a change will be announced well in advance of being deployed of course.)
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Thanks for the answer. "Legacy" becomes a word I really encounter to often when working with wikidata Json -.-* So, since there will already be a change in the near future how about introducing a field for the used Json-schema version? Just something like: { "version": "2014-07-01", // blubb… usual stuff here }
Also: can we cut a bit of redundancy? If the ting has an id of the form "Q[number]" it is per definition an item. No need to mention that again. (The reason here is serialization effort, not deserialization in any way)
Is there any planned date when "soon in the future" will be? What would be the best way to join the discussion of the upcoming format? Is there a proposal anywhere? Also: the Json descriptions in the wikidata wiki seem outdated…
Sorry for the passive-agressive tune, but having to write a ner Json (de)serialization every quarter year is not that productive.
Best greetings Fredo Erxleben
On 17/06/14 17:24, Fredo Erxleben wrote: ...
Also: can we cut a bit of redundancy? If the ting has an id of the form "Q[number]" it is per definition an item. No need to mention that again. (The reason here is serialization effort, not deserialization in any way)
It is likely that it will not be possible to tell the entity type from the entity id for all entity types introduced in the future.
Is there any planned date when "soon in the future" will be? What would be the best way to join the discussion of the upcoming format? Is there a proposal anywhere? Also: the Json descriptions in the wikidata wiki seem outdated…
Sorry for the passive-agressive tune, but having to write a ner Json (de)serialization every quarter year is not that productive.
Well, on the up side, it will really motivate people to use Wikidata Toolkit ;-)
Cheers,
Markus
Hey,
Sorry for the passive-agressive tune, but having to write a ner Json
(de)serialization every quarter year is not that productive.
Considering we have not changed the internal format like this before, and that we are changing it to the one already used by the web API, in part to have less differences, this seems to be an unfair characterization to me.
Well, on the up side, it will really motivate people to use Wikidata
Toolkit ;-)
And of course there is the PHP library we ourselves are using for those creating PHP based tools. https://github.com/wmde/WikibaseDataModelSerialization
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
On 17.06.2014 21:17, Jeroen De Dauw wrote:
Hey,
Sorry for the passive-agressive tune, but having to write a ner Json
(de)serialization every quarter year is not that productive.
Considering we have not changed the internal format like this before, and that we are changing it to the one already used by the web API, in part to have less differences, this seems to be an unfair characterization to me.
Okay, strike the "quarter". The point is, that there have been a bunch of changes and they all have to be supported. I am aware that there will be probably no perfect Json version. While looking at the dumps it seems, that, when updated, items are written out as the then actual Json version. Items not updated for a long time therefor remain in an older Json.
TLDR: - having a versioning tag would help a lot - not mixing Json versions in the dump would help a lot - documenting the existing versions would help a lot (code is no documentation)
Even if no-one wants to do it retroactively, it would be nice if we could make a habit out of it in the future. </whining><working>
Well, on the up side, it will really motivate people to use Wikidata
Toolkit ;-)
And of course there is the PHP library we ourselves are using for those creating PHP based tools. https://github.com/wmde/WikibaseDataModelSerialization
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
On Wed, Jun 18, 2014 at 11:05 AM, Fredo Erxleben fredo.erxleben@tu-dresden.de wrote:
On 17.06.2014 21:17, Jeroen De Dauw wrote:
Hey,
Sorry for the passive-agressive tune, but having to write a ner Json
(de)serialization every quarter year is not that productive.
Considering we have not changed the internal format like this before, and that we are changing it to the one already used by the web API, in part to have less differences, this seems to be an unfair characterization to me.
Okay, strike the "quarter". The point is, that there have been a bunch of changes and they all have to be supported. I am aware that there will be probably no perfect Json version. While looking at the dumps it seems, that, when updated, items are written out as the then actual Json version. Items not updated for a long time therefor remain in an older Json.
TLDR:
- having a versioning tag would help a lot
- not mixing Json versions in the dump would help a lot
We are working on that right now.
- documenting the existing versions would help a lot (code is no
documentation)
Even if no-one wants to do it retroactively, it would be nice if we could make a habit out of it in the future. </whining><working>
Cheers Lydia
On 17.06.2014 20:12, Markus Krötzsch wrote:
On 17/06/14 17:24, Fredo Erxleben wrote: ...
Also: can we cut a bit of redundancy? If the ting has an id of the form "Q[number]" it is per definition an item. No need to mention that again. (The reason here is serialization effort, not deserialization in any way)
It is likely that it will not be possible to tell the entity type from the entity id for all entity types introduced in the future.
Can you please elaborate? Is there any reason that the prefixing we use now might break?
Is there any planned date when "soon in the future" will be? What would be the best way to join the discussion of the upcoming format? Is there a proposal anywhere? Also: the Json descriptions in the wikidata wiki seem outdated…
Sorry for the passive-agressive tune, but having to write a ner Json (de)serialization every quarter year is not that productive.
Well, on the up side, it will really motivate people to use Wikidata Toolkit ;-)
Cheers,
Markus
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
wikidata-tech@lists.wikimedia.org