The standard algorithm for a path search is very simple:
Keep adding a new generation of links, until the new link brings in
no node not already seen.
This works for graphs of equivalence relations, it works for directed
acyclic graphs.
It's not the /graphs/ that are causing the problem here, because
Blazegraph can handle either of them by themself and give the right answer.
Rather, in the query like:
SELECT (COUNT(DISTINCT(?city)) AS ?count) WHERE {
?city wdt:P31/wdt:P279* wd:Q515 . # find instances of subclasses of city
?city wdt:P131* wd:Q1202 .
}
something is going wrong with the way Blazegraph handles the two
conditions *together*.
I suspect this may be closely related to whatever is going wrong with a
query like:
SELECT (COUNT(DISTINCT(?a)) AS ?count) WHERE {
BIND (wd:Q3305213 AS ?class) .
?a wdt:P31/wdt:P279* ?class .
}
which times out.;
It's the plan of joins which is going wrong, not whether the graph is
acyclic or not.
-- James.
On 25/10/2015 17:53, Daniel Kinzler wrote:
"Said to
be the same as" is a good example of a case where cycles are
unavoidable. A possible workaround in this case is to make sure that the
transitive closure of "said to be the same as" is already in the data, such
that
the path "P460+" returns the same results as a mere "P460" would.
It's not
ideal, but maybe workable.
I think we have to distinguish between different use cases:
1) Antisymmetric transitive relations, like subclass-of or part-of, which should
form an acyclic graph. For these, the "*" notation in sparql can be used to
query a sub-graph, such as all kinds of cars or all places in Idaho. This is our
primary use case for path traversal, I believe
2) Symmetric transitive relations, such as "said to be the same as". These
(should) form small "islands" of fully connected graphs that are (hopefully)
unconnected to each other. Here, the "*" notation can be used to include the
entire clique instead of only a single node in a query. This might be useful in
some cases, but doesn't strike me as a typical use case.
3) Cycles in non-transitive properties: these are not errors at all, and
problems only arise when such properties as used in a query as if they were
transitive. We could perhaps detect and reject attempts to apply the "*"
notation to properties that are not transitive.
4) Intransitive symmetrical relations (e.g. "souse of"). Do we need any
special
handling for them, or do they just get treated like (3)?
Anyway: we need a solution for (1) that allows transitive queries, and a
solution for (3) that prevents pathological behavior. If we get nice handling
for case (2), that's a bonus, but not a requirement, I think.