Hi!
You will find that Wikidata, is doing the very same thing, but with much more hardware at their disposal, since they have more funding than DBpedia, at this point in time.
Well, we now are running on two servers (and getting another one hopefully next Q), with mirror set on standby in case of disaster. It's not *much* more hardware I think (no idea what HW DBpedia one runs on but it must be at least one server? ;). But IIRC DBpedia has only static snapshot - please correct me here if I'm wrong - while we do live update. Which yes, makes performance an ongoing concern. That's why we have 30s timeout and connection limits :)
Still, expecting sub-second responses for SPARQL on any random query on billion-sized database sounds unrealistic to me.
The key issue here is all about what method a given service providers chooses en route to addressing the expectations of users, as I've outlined above. Fundamentally, each service provider will use a variety of solution deployment techniques that boil down to:
- Massive Server Cluzsters (sharded) and Proxies
Sharding makes querying much harder IIUC. Though would like to see some data on how big DBs behave under sharding vs. just distributing requests across servers.
- Fast multi-threaded instances (no sharding but via replication
topologies) behind proxies (functioning as cops, so to speak).
That's basically what we're doing now.