[Moving discussion to wikidata-tech, since it is of general interest.
The question was how the grouped statements seen in the UI are computed
from the flat list of statements found in the dumps]
Hi Fredo,
StatementGroups are a new structure that we decided to introduce to the
data model last week (and which did not exist so far). They represent
statements that are grouped by property, like it can be seen in the user
interface. The JSON generated by the Web API also groups statements by
property.
As you noticed, the internal JSON format found in the dumps does not
have such groups -- there is just a list of statements there, which may
not even have statements of the same property next to each other. The
order of groups shown in the UI and the API results is computed from
this internal list. Having StatementGroups as an explicit construct will
allow the order of the groups to be cntrolled in a saner way.
Currently the groups are created from the plain list by putting the
group for a property "at the first position where a statement of this
property occurs in the list". More algorithmically:
* Input: a list of statements
* Output: a list of lists of statements
* result = [] // empty list
* Iterate over all statements in the input:
** if property of statement has no group in result yet:
*** create new group and append it to the result
** add the statement to its group
The statements within each group are reordered in the UI, so that the
ones with preferred rank are on top, followed by the ones of normal
rank, followed by the ones of deprecated rank. The relative order of
statements with the same rank should be preserved in this operation.
We do not have a representation for these rank-based groups in the data
model (it seemed not as critical as the StatementGroups), so the
statements in each group should just be kept in their original order.
This is also important for moving groups, since the group can currently
only be moved by moving its first statement in the internal dump list
(which may not have the highest rank). This will hopefully at some point
get easier with a new API action for moving groups. Reordering
statements into groups as described above is not significant for query
answering or for display -- in fact Wikibase does this internally on
some edits. So this preprocessing does not really change the data.
In the future, I hope that also the rank-based ordering is maintained
within statement groups.
I hope this answers the original question :-)
Cheers,
Markus
On 19/02/14 18:33, Fredo Erxleben wrote:
> Hello everybody,
>
> I encountered a question when trying to take apart statements:
>
> So far I encountered JSON looking like this:
> "claims":
> [
> {
> "m":["value",107,"wikibase-entityid",{"entity-type":"item","numeric-id":618123}],
>
> "q":[],
> "g":"q58404$773743FD-0D4F-48D0-ACAE-028B5D37CA25",
> "rank":1,
> "refs":[[["value",143,"wikibase-entityid",{"entity-type":"item","numeric-id":10000}]]]
>
> },
> … … …
> {
> "m":["value",646,"string","\/m\/0f0_k6"],
> "q":[],
> "g":"Q58404$12E30FC1-F54F-44D4-91AD-99C453935FD4",
> "rank":1,
> "refs":[[["value",248,"wikibase-entityid",{"entity-type":"item","numeric-id":15241312}],["value",577,"time",{"time":"+00000002013-10-28T00:00:00Z","timezone":0,"before":0,"after":0,"precision":11,"calendarmodel":"http:\/\/www.wikidata.org\/entity\/Q1985727"}]]]
>
> }
> ]
>
> which is just one StatementGroup with some statements, as far as I
> understand it.
> Now, how would it look, if the Item has multiple statement groups? then
> the value to the "claim" - key would not be an array of statements but
> an array of arrays of statements? Or do have all Items have only one
> statement group?
>
> Kind regards
> -- Fredo
>