On Thu, Jul 10, 2014 at 3:50 PM, Daniel Kinzler <daniel.kinzler@wikimedia.de
wrote:
Am 09.07.2014 19:39, schrieb Dimitris Kontokostas:
On Wed, Jul 9, 2014 at 6:13 PM, Daniel Kinzler <
daniel.kinzler@wikimedia.de
mailto:daniel.kinzler@wikimedia.de> wrote:
Am 09.07.2014 08:14, schrieb Dimitris Kontokostas: > Maybe I am biased with DBpedia but by doing some experiments on
English
> Wikipedia we found that the ideal update with OAI-PMH time was
every ~5
minutes. > OAI aggregates multiple revisions of a page to a single edit > so when we ask: "get me the items that changed the last 5 minutes"
we skip the
> processing of many minor edits > It looks like we lose this option with PubSubHubbub right? I'm not quite positive on this point, but I think with PuSH, this is
done by the
hub. If the hub gets 20 notifications for the same resource in one
minute, it
will only grab and distribute the latest version, not all 20. But perhaps someone from the PuSH development team could confirm
this.
It 'd be great if the dev team can confirm this. Besides push notifications, is polling an option in PuSH? I briefed
through the
spec but couldn't find this.
Yes. You can just poll the interface that the hub uses to fetch new data.
Thanks for the info Daniel
I'm waiting for the dev to confirm the revision merging and one last question / use case from me.
Since you'll sync to an external server (in Google right?), did you set any requirements on the durability of the changesets? I mean, are the changes stored *for ever* or did you set any ttl? e.g. my application breaks for a week and I want to resume, or I download a one-month old dump and want to get in sync, etc
In OAI-PMH I could for instance set the date to 15/01/2001 and get all pages by modification date In PuSH this would require some sort of importing and is probably out of the question right? :)
Cheers, Dimitris
-- daniel
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.