On Thu, Jul 10, 2014 at 3:50 PM, Daniel Kinzler <daniel.kinzler(a)wikimedia.de
> Am 09.07.2014 19:39, schrieb Dimitris Kontokostas:
> > On Wed, Jul 9, 2014 at 6:13 PM, Daniel Kinzler <
> > <mailto:email@example.com>
> > Am 09.07.2014 08:14, schrieb Dimitris Kontokostas:
> > > Maybe I am biased with DBpedia but by doing some experiments on
> > > Wikipedia we found that the ideal update with OAI-PMH time was
> every ~5
> > minutes.
> > > OAI aggregates multiple revisions of a page to a single edit
> > > so when we ask: "get me the items that changed the last 5
> we skip the
> > > processing of many minor edits
> > > It looks like we lose this option with PubSubHubbub right?
> > I'm not quite positive on this point, but I think with PuSH, this is
> done by the
> > hub. If the hub gets 20 notifications for the same resource in one
> minute, it
> > will only grab and distribute the latest version, not all 20.
> > But perhaps someone from the PuSH development team could confirm
> > It 'd be great if the dev team can confirm this.
> > Besides push notifications, is polling an option in PuSH? I briefed
> through the
> > spec but couldn't find this.
> Yes. You can just poll the interface that the hub uses to fetch new data.
Thanks for the info Daniel
I'm waiting for the dev to confirm the revision merging and one last
question / use case from me.
Since you'll sync to an external server (in Google right?), did you set any
requirements on the durability of the changesets?
I mean, are the changes stored *for ever* or did you set any ttl?
e.g. my application breaks for a week and I want to resume, or I download a
one-month old dump and want to get in sync, etc
In OAI-PMH I could for instance set the date to 15/01/2001 and get all
pages by modification date
In PuSH this would require some sort of importing and is probably out of
the question right? :)
Senior Software Developer
Gesellschaft zur Förderung Freien Wissens e.V.