Hi all,
Wikidata aims to centralize structured datas from the Wikipedias in
one central wiki, starting with the language links. The main technical
challenge that we will face is to implement the data flow on the WMF
infrastructure efficiently. We invite peer-review on our design.
I am trying to give a simplified overview here. The full description
is on-wiki: <http://meta.wikimedia.org/wiki/Wikidata/Notes/Change_propagation>
There are a number of design choices. Here is our current thinking:
* Every change on the language links in Wikidata is stored in the
wb_changes table on Wikidata
* A script (or several, depends on load), run per wiki cluster, checks
wb_changes, gets a batch of changes it has not seen yet, and
creates jobs for all pages that are affected in all wikis on the
given cluster
* When the jobs are executed, the respective page is re-rendered and
the local recentchanges filled
* For re-rendering the page, the wiki needs access to the data.
We are not sure about how do to this best: have it per cluster,
or in one place only?
We appreciate comments. A lot. This thing is make-or-break for the
whole project, and it is getting kinda urgent.
Cheers,
Denny
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 |
http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.