Ashar Voultoiz wrote:
<snip attempt to rescue the thing>
So when some site wants to use wikidatas, it sends a
query to the
wikidata server associated with their internal reference (ex: name of
the wikipedia article and language). Wikidata then send them the
requested data and the wikidata internal reference.
When a wikidata is changed, the site send ping to every site
referencing that set of data with the update. From there the site
using data will answer wikidata with a code:
1/ data change acknowledged.
2/ no more need for this data, remove me.
3/ doesnt answer.
If it doesnt answer, there could be a system that queue the ping so it
can be sent later (and eventualy be dropped after x days).
That will work nicely, if we restrict WikiData access to "show me that
specific row from that specific table in that specific database". Which
is fine for "Show me data on that species".
But as soon as we allow queries to return lists (e.g., "show me all
species of that family"), we cannot do that anymore. Suppone someone
adds a species to WikiData. How can we know that a wikipedia page needs
to be updated?
Only one way to do that:
* Store the original query, the wikipedia page for that query, and its
results
* On changing any WikiData, rerun *all* these queries, compare their
results to the stored ones, and notify wikipedias if necessary
Rerunning a million queries for each data change will dwarf the possible
traffic generated from pull (pull isn't really better either; that's the
dilemma).
Also, pushing will require extensive infrastructure on the recipient's
site, which is not necessarily a wikimedia project (the data should be
available to everyone).
Magnus