If that's too slow, just query the recentchanges table directly. Or, if you want
to be more wikibase centric, query the wbchanges table, it's conceptually similar.
HTH
Daniel
Am 21.12.2014 12:35, schrieb Jan Zerebecki:
It might be a good idea to add an API that outputs the
entity IDs that changed
since time x or revision y. For older data it could refer to index files for the
dumps. Makes probably more sense than to create a dump each minute.
hourly dumps:
https://phabricator.wikimedia.org/T85100
changed entities API:
https://phabricator.wikimedia.org/T85103
On Sat, Dec 20, 2014 at 9:03 PM, Stas Malyshev <smalyshev(a)wikimedia.org
<mailto:smalyshev@wikimedia.org>> wrote:
Hi!
The best place for this kind of question would be
the wikidata-tech mailing list
<wikidata-tech(a)lists.wikimedia.org
<mailto:wikidata-tech@lists.wikimedia.org>>. It would probably be a good
idea if you
(and whoever else deals with wikidata on the
technical level) were subscribed
there. It's pretty low traffic.
Thanks, I've sent the subscription request and adding it to the CC.
Still learning the right places to go for things :)
Statement IDs are GUIDs (with the Item ID
prefixed), and they do not change when
the Statement changes (otherwise, they would be hashes, not IDs - References are
currently handled by hash).
From the export/import point of view, I think I'd prefer immutable
claims (i.e. ID changes each time claim changes) as they are easier to
handle, but as it is not the case, I can switch to using the content
hash instead. The performance impact (time spent calculating the hashes)
should not be too big.
One thing that would be rather easy to do is to
make JSON dumps of just the
items that changed in the last X hours. But that wouldn't tell you wich
statements changed.
I think for imports the best thing would be to have real diffs - i.e.
list of claims/item fields that were added/removed/changed - but if
that's not feasible, list of changed items would be great too. We may
want this with even more frequency than hours. Item data is not that
big, so loading it and running the diff manually would still be
workable. It would be slightly slower for big items (since each claim
for the item has to be examined) and requires maintaining additional
data structure to efficiently enumerate the claims, but it should be
still workable.
Thanks,
Stas
--
Best regards,
Jan Zerebecki
Software Engineer
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
http://wikimedia.de
Imagine a world, in which every single human being can freely share in the sum
of all knowledge. That‘s our commitment.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der
Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.