Hi,
Is it easy to brief the added value (or supported use cases) by switching
to PubSubHubbub?
The edit stream in Wikidata is so huge that I can hardly think of anyone
wanting to be in *real-time* sync with Wikidata
With 20 p/s their infrastructure should be pretty scalable to not break.
Maybe I am biased with DBpedia but by doing some experiments on English
Wikipedia we found that the ideal update with OAI-PMH time was every ~5
minutes.
OAI aggregates multiple revisions of a page to a single edit
so when we ask: "get me the items that changed the last 5 minutes" we skip
the processing of many minor edits
It looks like we lose this option with PubSubHubbub right?
As we already asked before, does PubSubHubbub supports mirroring a wikidata
clone? The OAI-PMH extension has this option
Best,
Dimitris
On Tue, Jul 8, 2014 at 11:31 AM, Daniel Kinzler <daniel.kinzler(a)wikimedia.de
Replying to myself because I forgot to mention an
important detail:
Am 08.07.2014 10:22, schrieb Daniel Kinzler:
Am 08.07.2014 01:46, schrieb Rob Lanphier:
> On Fri, Jul 4, 2014 at 7:16 AM, Lydia Pintscher <
lydia.pintscher(a)wikimedia.de
...
> Hi Lydia,
>
> Thanks for providing the basic overview of this. Could you (or someone
on
the
> team) provide an explanation about how you
would like this to be
configured on
the
Wikimedia cluster?
We'd like to enable it just on Wikidata at first, but I see no reason
not to
enable it for all projects if that goes well.
The PubSubHubbub (PuSH) extension would be configured to push
notifications to
the google hub (two per edit). The hub then
notifies any subscribers via
their
callback urls.
We need a proxy to be set up to allow the app servers to talk to the
google hub.
If this is deployed on full scale, we expect in excess of 20 POST requests
per
second (two per edit), plus up to the same number (but probably fewer) of
GET
requests coming back from the hub, asking for the full page content of
every
page changed, as XML export, from a special page interface similar to
Special:Export. This would probably bypass the web cache.
PubSubHubbub is nice and simple, but it's really designed for news feeds,
not
for versioned content of massive collaborative sites. It works, but it's
not as
efficient as we could wish.
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech