Hi,
On Fri, Apr 26, 2013 at 5:29 AM, Sebastian Hellmann
<hellmann(a)informatik.uni-leipzig.de> wrote:
Well, PubSubHubbub is a nice idea. However it clearly
depends on two factors:
1. whether Wikidata sets up such an infrastructure (I need to check whether we have
capacities, I am not sure atm)
Capacity for what? the infrastructure should be not be a problem.
(famous last words, can look more closely tomorrow. but I'm really not
worried about it) And you don't need any infrastructure at all for
development; just use one of google's public instances.
2. whether performance is good enough to handle
high-volume publishers
Again, how do you mean?
Basically, polling to recent changes [1] and then do a
http request to the individual pages should be fine for a start. So I guess this is what
we will implement, if there aren't any better suggestions.
The whole issue is problematic and the DBpedia project would be happy, if this were
discussed and decided right now, so we can plan development.
What is the best practice to get updates from Wikipedia at the moment?
I believe just about everyone uses the IRC feed from
irc.wikimedia.org.
https://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds
I imagine wikidata will or maybe already does propagate changes to a
channel on that server but I can imagine IRC would not be a good
method for many Instant data repo users. Some will not be able to
sustain a single TCP connection for extended periods, some will not be
able to use IRC ports at all, and some may go offline periodically.
e.g. a server on a laptop. AIUI, PubSubHubbub has none of those
problems and is better than the current IRC solution in just about
every way.
We could potentially even replace the current cross-DB job queue
insert crazyness with PubSubHubbub for use on the cluster internally.
-Jeremy