Hi!
You will find that Wikidata, is doing the very same
thing, but with much
more hardware at their disposal, since they have more funding than
DBpedia, at this point in time.
Well, we now are running on two servers (and getting another one
hopefully next Q), with mirror set on standby in case of disaster. It's
not *much* more hardware I think (no idea what HW DBpedia one runs on
but it must be at least one server? ;). But IIRC DBpedia has only static
snapshot - please correct me here if I'm wrong - while we do live
update. Which yes, makes performance an ongoing concern. That's why we
have 30s timeout and connection limits :)
Still, expecting sub-second responses for SPARQL on any random query on
billion-sized database sounds unrealistic to me.
The key issue here is all about what method a given
service providers
chooses en route to addressing the expectations of users, as I've
outlined above. Fundamentally, each service provider will use a variety
of solution deployment techniques that boil down to:
1. Massive Server Cluzsters (sharded) and Proxies
Sharding makes querying much harder IIUC. Though would like to see some
data on how big DBs behave under sharding vs. just distributing requests
across servers.
2. Fast multi-threaded instances (no sharding but via
replication
topologies) behind proxies (functioning as cops, so to speak).
That's basically what we're doing now.
--
Stas Malyshev
smalyshev(a)wikimedia.org