Ashar Voultoiz wrote: <snip attempt to rescue the thing>
So when some site wants to use wikidatas, it sends a query to the wikidata server associated with their internal reference (ex: name of the wikipedia article and language). Wikidata then send them the requested data and the wikidata internal reference.
When a wikidata is changed, the site send ping to every site referencing that set of data with the update. From there the site using data will answer wikidata with a code: 1/ data change acknowledged. 2/ no more need for this data, remove me. 3/ doesnt answer.
If it doesnt answer, there could be a system that queue the ping so it can be sent later (and eventualy be dropped after x days).
That will work nicely, if we restrict WikiData access to "show me that specific row from that specific table in that specific database". Which is fine for "Show me data on that species".
But as soon as we allow queries to return lists (e.g., "show me all species of that family"), we cannot do that anymore. Suppone someone adds a species to WikiData. How can we know that a wikipedia page needs to be updated?
Only one way to do that: * Store the original query, the wikipedia page for that query, and its results * On changing any WikiData, rerun *all* these queries, compare their results to the stored ones, and notify wikipedias if necessary
Rerunning a million queries for each data change will dwarf the possible traffic generated from pull (pull isn't really better either; that's the dilemma).
Also, pushing will require extensive infrastructure on the recipient's site, which is not necessarily a wikimedia project (the data should be available to everyone).
Magnus