Hi all,
Wikidata aims to centralize structured datas from the Wikipedias in one central wiki, starting with the language links. The main technical challenge that we will face is to implement the data flow on the WMF infrastructure efficiently. We invite peer-review on our design.
I am trying to give a simplified overview here. The full description is on-wiki: http://meta.wikimedia.org/wiki/Wikidata/Notes/Change_propagation
There are a number of design choices. Here is our current thinking:
* Every change on the language links in Wikidata is stored in the wb_changes table on Wikidata * A script (or several, depends on load), run per wiki cluster, checks wb_changes, gets a batch of changes it has not seen yet, and creates jobs for all pages that are affected in all wikis on the given cluster * When the jobs are executed, the respective page is re-rendered and the local recentchanges filled * For re-rendering the page, the wiki needs access to the data. We are not sure about how do to this best: have it per cluster, or in one place only?
We appreciate comments. A lot. This thing is make-or-break for the whole project, and it is getting kinda urgent.
Cheers, Denny