Is it easy to brief the added value (or supported use cases) by switching to PubSubHubbub?
The edit stream in Wikidata is so huge that I can hardly think of anyone wanting to be in *real-time* sync with Wikidata
With 20 p/s their infrastructure should be pretty scalable to not break.

Maybe I am biased with DBpedia but by doing some experiments on English Wikipedia we found that the ideal update with OAI-PMH time was every ~5 minutes.
OAI aggregates multiple revisions of a page to a single edit 
so when we ask: "get me the items that changed the last 5 minutes" we skip the processing of many minor edits

It looks like we lose this option with PubSubHubbub right?
As we already asked before, does PubSubHubbub supports mirroring a wikidata clone? The OAI-PMH extension has this option


On Tue, Jul 8, 2014 at 11:31 AM, Daniel Kinzler <daniel.kinzler@wikimedia.de> wrote:
Replying to myself because I forgot to mention an important detail:

Am 08.07.2014 10:22, schrieb Daniel Kinzler:
> Am 08.07.2014 01:46, schrieb Rob Lanphier:
>> On Fri, Jul 4, 2014 at 7:16 AM, Lydia Pintscher <lydia.pintscher@wikimedia.de
> ...
>> Hi Lydia,
>> Thanks for providing the basic overview of this.  Could you (or someone on the
>> team) provide an explanation about how you would like this to be configured on
>> the Wikimedia cluster?
> We'd like to enable it just on Wikidata at first, but I see no reason not to
> enable it for all projects if that goes well.
> The PubSubHubbub (PuSH) extension would be configured to push notifications to
> the google hub (two per edit). The hub then notifies any subscribers via their
> callback urls.

We need a proxy to be set up to allow the app servers to talk to the google hub.
If this is deployed on full scale, we expect in excess of 20 POST requests per
second (two per edit), plus up to the same number (but probably fewer) of GET
requests coming back from the hub, asking for the full page content of every
page changed, as XML export, from a special page interface similar to
Special:Export. This would probably bypass the web cache.

PubSubHubbub is nice and simple, but it's really designed for news feeds, not
for versioned content of massive collaborative sites. It works, but it's not as
efficient as we could wish.

Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-tech mailing list

Kontokostas Dimitris