Am 09.07.2014 08:14, schrieb Dimitris Kontokostas:
> Hi,* It's easier to handle than OAI, because it uses the standard dump format.
>
> Is it easy to brief the added value (or supported use cases) by switching
> to PubSubHubbub?
* It's also push-based, avoiding constant polling on small wikis.
* The OAI extension has been deprecated for a long time now.
The "push" aspect is probably most useful for small wikis. It's true, for large
> The edit stream in Wikidata is so huge that I can hardly think of anyone wanting
> to be in *real-time* sync with Wikidata
> With 20 p/s their infrastructure should be pretty scalable to not break.
wikis, you could just poll, since you would hardly ever poll in vain.
IT would be very nice if the sync could be filtered by namespace, category, etc.
But PubSubHubbub (i'll use "PuSH" from now on) doesn't really support this, sadly.
I'm not quite positive on this point, but I think with PuSH, this is done by the
> Maybe I am biased with DBpedia but by doing some experiments on English
> Wikipedia we found that the ideal update with OAI-PMH time was every ~5 minutes.
> OAI aggregates multiple revisions of a page to a single edit
> so when we ask: "get me the items that changed the last 5 minutes" we skip the
> processing of many minor edits
> It looks like we lose this option with PubSubHubbub right?
hub. If the hub gets 20 notifications for the same resource in one minute, it
will only grab and distribute the latest version, not all 20.
But perhaps someone from the PuSH development team could confirm this.
Yes, there is a client extension for PuSH, allowing for seemless replication of
> As we already asked before, does PubSubHubbub supports mirroring a wikidata
> clone? The OAI-PMH extension has this option
one wiki into another, including creation and deletion (I don't know about
moves/renames).
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.