Am 09.07.2014 08:14, schrieb Dimitris Kontokostas:
Hi,
Is it easy to brief the added value (or supported use cases) by switching
to PubSubHubbub?
* It's easier to handle than OAI, because it uses the standard dump format.
* It's also push-based, avoiding constant polling on small wikis.
* The OAI extension has been deprecated for a long time now.
The edit stream in Wikidata is so huge that I can
hardly think of anyone wanting
to be in *real-time* sync with Wikidata
With 20 p/s their infrastructure should be pretty scalable to not break.
The "push" aspect is probably most useful for small wikis. It's true, for
large
wikis, you could just poll, since you would hardly ever poll in vain.
IT would be very nice if the sync could be filtered by namespace, category, etc.
But PubSubHubbub (i'll use "PuSH" from now on) doesn't really support
this, sadly.
Maybe I am biased with DBpedia but by doing some
experiments on English
Wikipedia we found that the ideal update with OAI-PMH time was every ~5 minutes.
OAI aggregates multiple revisions of a page to a single edit
so when we ask: "get me the items that changed the last 5 minutes" we skip the
processing of many minor edits
It looks like we lose this option with PubSubHubbub right?
I'm not quite positive on this point, but I think with PuSH, this is done by the
hub. If the hub gets 20 notifications for the same resource in one minute, it
will only grab and distribute the latest version, not all 20.
But perhaps someone from the PuSH development team could confirm this.
As we already asked before, does PubSubHubbub supports
mirroring a wikidata
clone? The OAI-PMH extension has this option
Yes, there is a client extension for PuSH, allowing for seemless replication of
one wiki into another, including creation and deletion (I don't know about
moves/renames).
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.