Hi,
On Fri, Apr 26, 2013 at 5:29 AM, Sebastian Hellmann hellmann@informatik.uni-leipzig.de wrote:
Well, PubSubHubbub is a nice idea. However it clearly depends on two factors:
- whether Wikidata sets up such an infrastructure (I need to check whether we have capacities, I am not sure atm)
Capacity for what? the infrastructure should be not be a problem. (famous last words, can look more closely tomorrow. but I'm really not worried about it) And you don't need any infrastructure at all for development; just use one of google's public instances.
- whether performance is good enough to handle high-volume publishers
Again, how do you mean?
Basically, polling to recent changes [1] and then do a http request to the individual pages should be fine for a start. So I guess this is what we will implement, if there aren't any better suggestions. The whole issue is problematic and the DBpedia project would be happy, if this were discussed and decided right now, so we can plan development.
What is the best practice to get updates from Wikipedia at the moment?
I believe just about everyone uses the IRC feed from irc.wikimedia.org. https://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds
I imagine wikidata will or maybe already does propagate changes to a channel on that server but I can imagine IRC would not be a good method for many Instant data repo users. Some will not be able to sustain a single TCP connection for extended periods, some will not be able to use IRC ports at all, and some may go offline periodically. e.g. a server on a laptop. AIUI, PubSubHubbub has none of those problems and is better than the current IRC solution in just about every way.
We could potentially even replace the current cross-DB job queue insert crazyness with PubSubHubbub for use on the cluster internally.
-Jeremy