Thad

On Thu, Mar 5, 2015 at 6:13 PM, Nikolas Everett <neverett@wikimedia.org> wrote:

On Thu, Mar 5, 2015 at 6:47 PM, Thad Guidry <thadguidry@gmail.com> wrote:
Nik,

Will you be incorporating MapGraph, as well, with GPU hardware as part of the scope of the Wikidata Query Service ? Or is that out of scope until you know what the load limits will be and just use BlazeGraph as is with CPU-bound memory ?

MapGraph isn't open source so we won't be using it.

What are the scalability plans for also using MapGraph with GPU's and their memory in the future, in case the need for faster graph traversal arises ?

So MapGraph is out but otherwise scalability plans are pretty standard stuff:
1. Instrument for slow stuff
2. Fix bugs that make it slow
3. Buy more servers to scale out when #2 gets too slow to keep up

These servers would just be replicas. This fails when the working set grows too large and that is something we'll be watching out for. BlazeGraph has some horizontal scaling features that we'll invoke if we get there.

Furthermore this'll all be easonably easy to run outside of the cluster so if folks need to take it locally and do things with it that we can't (like MapGraph) then it should work well.

I'm certainly weary of Java. I've worked in Java for years and I'm really familiar with all of its baggage. BlazeGraph does a very reasonable job with it. It feels like half of the graph databases are written in Java and I've always wondered why. Locking down the SPARQL endpoint so its "impossible" to overwhelm the system is high on our list of things to do and Java makes that harder. BlazeGraph's analytic query mode should help there. Ultimately I see the JVM as a risk to mitigate in this case.

Nik

_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech