New subject: claim IDs in wikidata

20 Dec 2014

Hi!

...
  The best place for this kind of question would be the
wikidata-tech mailing list
 &lt;wikidata-tech(a)lists.wikimedia.org&gt;rg>. It would probably be a good idea if you
 (and whoever else deals with wikidata on the technical level) were subscribed
 there. It's pretty low traffic. 
Thanks, I've sent the subscription request and adding it to the CC.
Still learning the right places to go for things :)

...
  Statement IDs are GUIDs (with the Item ID prefixed),
and they do not change when
 the Statement changes (otherwise, they would be hashes, not IDs - References are
 currently handled by hash). 
...
 From the export/import point of view, I think I'd
prefer immutable claims (i.e. ID changes each time claim changes) as they are easier
to
handle, but as it is not the case, I can switch to using the content
hash instead. The performance impact (time spent calculating the hashes)
should not be too big.

...
  One thing that would be rather easy to do is to make
JSON dumps of just the
 items that changed in the last X hours. But that wouldn't tell you wich
 statements changed. 
I think for imports the best thing would be to have real diffs - i.e.
list of claims/item fields that were added/removed/changed - but if
that's not feasible, list of changed items would be great too. We may
want this with even more frequency than hours. Item data is not that
big, so loading it and running the diff manually would still be
workable. It would be slightly slower for big items (since each claim
for the item has to be examined) and requires maintaining additional
data structure to efficiently enumerate the claims, but it should be
still workable.

Thanks,
Stas

Re: [Wikidata-tech] claim IDs in wikidata