On Thu, Mar 5, 2015 at 6:47 PM, Thad Guidry <thadguidry(a)gmail.com> wrote:
Nik,
Will you be incorporating MapGraph, as well, with GPU hardware as part of
the scope of the Wikidata Query Service ? Or is that out of scope until
you know what the load limits will be and just use BlazeGraph as is with
CPU-bound memory ?
MapGraph isn't open source so we won't be using it.
What are the scalability plans for also using MapGraph
with GPU's and
their memory in the future, in case the need for faster graph traversal
arises ?
So MapGraph is out but otherwise scalability plans are pretty standard
stuff:
1. Instrument for slow stuff
2. Fix bugs that make it slow
3. Buy more servers to scale out when #2 gets too slow to keep up
These servers would just be replicas. This fails when the working set
grows too large and that is something we'll be watching out for.
BlazeGraph has some horizontal scaling features that we'll invoke if we get
there.
Furthermore this'll all be easonably easy to run outside of the cluster so
if folks need to take it locally and do things with it that we can't (like
MapGraph) then it should work well.
I'm certainly weary of Java. I've worked in Java for years and I'm really
familiar with all of its baggage. BlazeGraph does a very reasonable job
with it. It feels like half of the graph databases are written in Java and
I've always wondered why. Locking down the SPARQL endpoint so its
"impossible" to overwhelm the system is high on our list of things to do
and Java makes that harder. BlazeGraph's analytic query mode should help
there. Ultimately I see the JVM as a risk to mitigate in this case.
Nik