Hi Fredo,
On 20/02/14 19:59, Fredo Erxleben wrote:
Hello everybody,
Since I am working on the conversion from the dump files to the wdtk
data model, I will have to take apart the "refs" section of the JSON
representing the stored items.
Now a "refs"-section most likely looks like this:
(Tried to format it for readability)
"refs":
[
[
[
"value",248,
"wikibase-entityid",{"entity-type":"item","numeric-id":15241312}
],
[
"value",577,
"time",{"time":"+00000002013-10-28T00:00:00Z","timezone":0,"before":0,"after":0,"precision":11,"calendarmodel":"http:\/\/www.wikidata.org\/entity\/Q1985727"}
]
]
]
So I figured out the following: The outer array groups all references.
The second level array groups information about one reference (so if we
had multiple references, we would also have multiple second-level
arrays) and the inner arrays each group one specific information about
one specific reference (as determined by the second level array they are
nested in).
Am I correct so far?
Yes, where the "one specific information" in inner arrays is the
encoding of one snak.
The integer following the "value"-string denotes what the following
information is about. So if I read that the value is 577, I know that
there must be a specification of time, don't I?
The values in the snak arrays are as follows in the case of ValueSnaks:
[0]: snak type ("value" for ValueSnaks; other possible values are
"novalue" and "somevalue")
[1]: snak property ("577" is for "P577")
[2]: primitive type of datavalue (these correspond to the ...Value classes)
[3]: encoding of the primitive datavalue
If you know the datatype of P577, then you could indeed infer that the
primitive value used here must be "time". However, the datatype is not
given in this place in the dump, so it would be impossible to interpret
the dump of one entity without knowing external context information.
This is why the type of the primitive value is explicitly specified.
Is there any specific reason, why this is done in array form and not in
a JSON object? (Since the "value"-key is always there one could know
from its value, what other keys must be available.)
"value" is not a key but an entry that denotes a snak type.
If yes, why mention the type of information (e.g.
"time") again?
Am I overlooking something?
Answered above. Another important point is that each primitive value can
decide on its encoding locally, without depending on the encoding of
other value types. Therefore, a tool that reads this data cannot "guess"
the primitive type by looking at the encoding only. It seems obvious
that the map
{"time":"+00000002013-10-28T00:00:00Z","timezone":0,"before":0,"after":0,"precision":11,"calendarmodel":"http:\/\/www.wikidata.org\/entity\/Q1985727"}
encodes a time value, but it should not be assumed that one can always
do this. There could even be values of different types that have the
same encoding.
Cheers,
Markus