Hoi,
One of the KEY reasons to have Wikidata is that is DOES update when there
is a change in the data. For instance, how many Wikipedias have an article
on my home town of Almere and have it said that Mrs Jorritsma is the mayor
... She will not be mayor forever ... There are many villages, towns and
cities like Almere.
I do positively not like the idea of all the wasted effort when a pushy
Wikidata can be and should be the solution.
Thanks,
Gerard
On 23 April 2012 16:09, Petr Bena <benapetr(a)gmail.com> wrote:
I mean, in simple words:
Your idea: when the data on wikidata is changed the new content is
pushed to all local wikis / somewhere
My idea: local wikis retrieve data from wikidata db directly, no need
to push anything on change
On Mon, Apr 23, 2012 at 4:07 PM, Petr Bena <benapetr(a)gmail.com> wrote:
I think it would be much better if the local
wikis where it is
supposed to access this would have some sort of client extension which
would allow them to render the content using the db of wikidata. That
would be much simpler and faster
On Mon, Apr 23, 2012 at 2:45 PM, Daniel Kinzler <daniel(a)brightbyte.de>
wrote:
> Hi all!
>
> The wikidata team has been discussing how to best make data from
wikidata
> available on local wikis. Fetching the data
via HTTP whenever a page is
> re-rendered doesn't seem prudent, so we (mainly Jeroen) came up with a
> push-based architecture.
>
> The proposal is at
> <
http://meta.wikimedia.org/wiki/Wikidata/Notes/Caching_investigation#Proposa…
,
> I have copied it below too.
>
> Please have a lot and let us know if you think this is viable, and
which of
the
> two variants you deem better!
>
> Thanks,
> -- daniel
>
> PS: Please keep the discussion on wikitech-l, so we have it all in one
place.
>
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> == Proposal: HTTP push to local db storage ==
>
> * Every time an item on Wikidata is changed, an HTTP push is issued to
all
> subscribing clients (wikis)
> ** initially, "subscriptions" are just entries in an array in the
configuration.
> ** Pushes can be done via the job queue.
> ** pushing is done via the mediawiki API, but other protocols such as
PubSub
> Hubbub / AtomPub can easily be added to
support 3rd parties.
> ** pushes need to be authenticated, so we don't get malicious crap.
Pushes
> should be done using a special user with a
special user right.
> ** the push may contain either the full set of information for the
item, or
just
> a delta (diff) + hash for integrity check (in
case an update was
missed).
>
> * When the client receives a push, it does two things:
> *# write the fresh data into a local database table (the local wikidata
cache)
> *# invalidate the (parser) cache for all
pages that use the respective
item (for
> now we can assume that we know this from the
language links)
> *#* if we only update language links, the page doesn't even need to be
> re-parsed: we just update the languagelinks in the cached ParserOutput
object.
>
> * when a page is rendered, interlanguage links and other info is taken
from
the
> local wikidata cache. No queries are made to
wikidata during
parsing/rendering.
>
> * In case an update is missed, we need a mechanism to allow requesting
a full
> purge and re-fetch of all data from on the
client side and not just
wait until
> the next push which might very well take a
very long time to happen.
> ** There needs to be a manual option for when someone detects this.
maybe
> action=purge can be made to do this. Simple
cache-invalidation however
shouldn't
> pull info from wikidata.
> **A time-to-live could be added to the local copy of the data so that
it's
> updated by doing a pull periodically so the
data does not stay stale
> indefinitely after a failed push.
>
> === Variation: shared database tables ===
>
> Instead of having a local wikidata cache on each wiki (which may grow
big - a
> first guesstimate of Jeroen and Reedy is up
to 1TB total, for all
wikis), all
> client wikis could access the same central
database table(s) managed
by the
> wikidata wiki.
>
> * this is similar to the way the globalusage extension tracks the usage
of
> commons images
> * whenever a page is re-rendered, the local wiki would query the table
in the
> wikidata db. This means a cross-cluster db
query whenever a page is
rendered,
> instead a local query.
> * the HTTP push mechanism described above would still be needed to
purge the
> parser cache when needed. But the push
requests would not need to
contain the
> updated data, they may just be requests to
purge the cache.
> * the ability for full HTTP pushes (using the mediawiki API or some
other
> interface) would still be desirable for 3rd
party integration.
>
> * This approach greatly lowers the amount of space used in the database
> * it doesn't change the number of http requests made
> ** it does however reduce the amount of data transferred via http (but
not by
>> much, at least not compared to pushing diffs)
>> * it doesn't change the number of database requests, but it introduces
>> cross-cluster requests
>
>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l