Hi!
It handles data locality across a shared nothing cluster just fine i.e., you can interact with any node in a Virtuoso cluster and experience identical behavior (everyone node looks like single node in the eyes of the operator).
Does this mean no sharding, i.e. each server stores the full DB? This is the model we're using currently, but given the growth of the data it may be non sustainable on current hardware. I see in your tables that Uniprot has about 30B triples, but I wonder how update loads there look like. Our main issue is that the hardware we have now is showing its limits when there's a lot of updates in parallel to significant query load. So I wonder if the "single server holds everything" model is sustainable in the long term.
There are live instances of Virtuoso that demonstrate its capabilities. If you want to explore shared-nothing cluster capabilities then our live LOD Cloud cache is the place to start [1][2][3]. If you want to see the single-server open source edition that you have DBpedia, DBpedia-Live, Uniprot and many other nodes in the LOD Cloud to choose from. All of these instance are highly connected.
Again, here the question is not too much in "can you load 7bn triples into Virtuoso" - we know we can. What we want to figure out whether given specific query/update patterns we have now - it is going to give us significantly better performance allowing to support our projected growth. And also possibly whether Virtuoso has ways to make our update workflow be more optimal - e.g. right now if one triple changes in Wikidata item, we're essentially downloading and updating the whole item (not exactly since triples that stay the same are preserved but it requires a lot of data transfer to express that in SPARQL). Would there be ways to update the things more efficiently?
Virtuoso handles both shared-nothing clusters and replication i.e., you can have a cluster configuration used in conjunction with a replication topology if your solution requires that.
Replication could certainly be useful I think it it's faster to update single server and then replicate than simultaneously update all servers (that's what is happening now).