On Wed, Jul 9, 2014 at 6:13 PM, Daniel Kinzler <daniel.kinzler@wikimedia.de> wrote:
Am 09.07.2014 08:14, schrieb Dimitris Kontokostas:
> Hi,
>
> Is it easy to brief the added value (or supported use cases) by switching
> to PubSubHubbub?

* It's easier to handle than OAI, because it uses the standard dump format.
* It's also push-based, avoiding constant polling on small wikis.
* The OAI extension has been deprecated for a long time now.

> The edit stream in Wikidata is so huge that I can hardly think of anyone wanting
> to be in *real-time* sync with Wikidata
> With 20 p/s their infrastructure should be pretty scalable to not break.

The "push" aspect is probably most useful for small wikis. It's true, for large
wikis, you could just poll, since you would hardly ever poll in vain.

IT would be very nice if the sync could be filtered by namespace, category, etc.
But PubSubHubbub (i'll use "PuSH" from now on) doesn't really support this, sadly.

> Maybe I am biased with DBpedia but by doing some experiments on English
> Wikipedia we found that the ideal update with OAI-PMH time was every ~5 minutes.
> OAI aggregates multiple revisions of a page to a single edit
> so when we ask: "get me the items that changed the last 5 minutes" we skip the
> processing of many minor edits
> It looks like we lose this option with PubSubHubbub right?

I'm not quite positive on this point, but I think with PuSH, this is done by the
hub. If the hub gets 20 notifications for the same resource in one minute, it
will only grab and distribute the latest version, not all 20.

But perhaps someone from the PuSH development team could confirm this.

It 'd be great if the dev team can confirm this. 
Besides push notifications, is polling an option in PuSH? I briefed through the spec but couldn't find this.
 

> As we already asked before, does PubSubHubbub supports mirroring a wikidata
> clone? The OAI-PMH extension has this option

Yes, there is a client extension for PuSH, allowing for seemless replication of
one wiki into another, including creation and deletion (I don't know about
moves/renames).

--
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.



--
Kontokostas Dimitris