On 19 Dec 2013, at 4:28 AM, Johannes Kroll <johannes.kroll(a)wikimedia.de> wrote:
Yes. In the link I posted in the mail you quoted,
there's an
example query including a set operation. The timing includes setting up
the connection, doing the two queries and the set operation, converting
the result to the line-based format, and transferring that over HTTP.
This is a real-world query, and about the same as you would get in a
tool that runs on Labs which uses CatGraph (minus the overhead from
starting the Curl binary, setting up the connection, and the slight
overhead from HTTP, because you would use plain TCP transfers in such a
tool). You can login to Tool Labs and try various queries yourself.
"Real-world" means little: the world is large--people have different needs.
Having the graph, embedded, with high-access speed is different than depending on a
service with a fixed number of primitives.
What if I want to know a centrality measure? Will you implement it for me? If I have to
fetch successor lists and compute it by myself it will be 100-1000x slower. If I ask for a
successor list, how much time per arc, overall, will it take? This is the standard measure
for the speed of a graph representation. I can't evince anything from the example you
quote.
"Deliver an edge in 50ns" sounds impressive,
but this value doesn't
mean much without context. What does it mean?
Most basic graph traversal algorithms are linear in the number of arcs traversed. Thus, a
standard and informative measure of the speed of access to a graph (see any paper on the
subject) is how much time it takes to get a successor (say, from an iterator providing the
successors of a node). You don't need any context.
Note that it's you claiming that CatGraph is the service I need. I simply think is a
service with a different goal. I'm sure people love it.
Ciao,
seba