Hoi, One of the KEY reasons to have Wikidata is that is DOES update when there is a change in the data. For instance, how many Wikipedias have an article on my home town of Almere and have it said that Mrs Jorritsma is the mayor ... She will not be mayor forever ... There are many villages, towns and cities like Almere.
I do positively not like the idea of all the wasted effort when a pushy Wikidata can be and should be the solution. Thanks, Gerard
On 23 April 2012 16:09, Petr Bena benapetr@gmail.com wrote:
I mean, in simple words:
Your idea: when the data on wikidata is changed the new content is pushed to all local wikis / somewhere
My idea: local wikis retrieve data from wikidata db directly, no need to push anything on change
On Mon, Apr 23, 2012 at 4:07 PM, Petr Bena benapetr@gmail.com wrote:
I think it would be much better if the local wikis where it is supposed to access this would have some sort of client extension which would allow them to render the content using the db of wikidata. That would be much simpler and faster
On Mon, Apr 23, 2012 at 2:45 PM, Daniel Kinzler daniel@brightbyte.de
wrote:
Hi all!
The wikidata team has been discussing how to best make data from
wikidata
available on local wikis. Fetching the data via HTTP whenever a page is re-rendered doesn't seem prudent, so we (mainly Jeroen) came up with a push-based architecture.
The proposal is at <
http://meta.wikimedia.org/wiki/Wikidata/Notes/Caching_investigation#Proposal...
,
I have copied it below too.
Please have a lot and let us know if you think this is viable, and
which of the
two variants you deem better!
Thanks, -- daniel
PS: Please keep the discussion on wikitech-l, so we have it all in one
place.
>> >> == Proposal: HTTP push to local db storage == >> >> * Every time an item on Wikidata is changed, an HTTP push is issued to all >> subscribing clients (wikis) >> ** initially, "subscriptions" are just entries in an array in the configuration. >> ** Pushes can be done via the job queue. >> ** pushing is done via the mediawiki API, but other protocols such as PubSub >> Hubbub / AtomPub can easily be added to support 3rd parties. >> ** pushes need to be authenticated, so we don't get malicious crap. Pushes >> should be done using a special user with a special user right. >> ** the push may contain either the full set of information for the item, or just >> a delta (diff) + hash for integrity check (in case an update was missed). >> >> * When the client receives a push, it does two things: >> *# write the fresh data into a local database table (the local wikidata cache) >> *# invalidate the (parser) cache for all pages that use the respective item (for >> now we can assume that we know this from the language links) >> *#* if we only update language links, the page doesn't even need to be >> re-parsed: we just update the languagelinks in the cached ParserOutput object. >> >> * when a page is rendered, interlanguage links and other info is taken from the >> local wikidata cache. No queries are made to wikidata during parsing/rendering. >> >> * In case an update is missed, we need a mechanism to allow requesting a full >> purge and re-fetch of all data from on the client side and not just wait until >> the next push which might very well take a very long time to happen. >> ** There needs to be a manual option for when someone detects this. maybe >> action=purge can be made to do this. Simple cache-invalidation however shouldn't >> pull info from wikidata. >> **A time-to-live could be added to the local copy of the data so that it's >> updated by doing a pull periodically so the data does not stay stale >> indefinitely after a failed push. >> >> === Variation: shared database tables === >> >> Instead of having a local wikidata cache on each wiki (which may grow big - a >> first guesstimate of Jeroen and Reedy is up to 1TB total, for all wikis), all >> client wikis could access the same central database table(s) managed by the >> wikidata wiki. >> >> * this is similar to the way the globalusage extension tracks the usage of >> commons images >> * whenever a page is re-rendered, the local wiki would query the table in the >> wikidata db. This means a cross-cluster db query whenever a page is rendered, >> instead a local query. >> * the HTTP push mechanism described above would still be needed to purge the >> parser cache when needed. But the push requests would not need to contain the >> updated data, they may just be requests to purge the cache. >> * the ability for full HTTP pushes (using the mediawiki API or some other >> interface) would still be desirable for 3rd party integration. >> >> * This approach greatly lowers the amount of space used in the database >> * it doesn't change the number of http requests made >> ** it does however reduce the amount of data transferred via http (but not by >> much, at least not compared to pushing diffs) >> * it doesn't change the number of database requests, but it introduces >> cross-cluster requests >> >> >> >> _______________________________________________ >> Wikitech-l mailing list >> Wikitech-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l