The standard algorithm for a path search is very simple:
Keep adding a new generation of links, until the new link brings in no node not already seen.
This works for graphs of equivalence relations, it works for directed acyclic graphs.
It's not the /graphs/ that are causing the problem here, because Blazegraph can handle either of them by themself and give the right answer.
Rather, in the query like:
SELECT (COUNT(DISTINCT(?city)) AS ?count) WHERE { ?city wdt:P31/wdt:P279* wd:Q515 . # find instances of subclasses of city ?city wdt:P131* wd:Q1202 . }
something is going wrong with the way Blazegraph handles the two conditions *together*.
I suspect this may be closely related to whatever is going wrong with a query like:
SELECT (COUNT(DISTINCT(?a)) AS ?count) WHERE { BIND (wd:Q3305213 AS ?class) . ?a wdt:P31/wdt:P279* ?class . }
which times out.;
It's the plan of joins which is going wrong, not whether the graph is acyclic or not.
-- James.
On 25/10/2015 17:53, Daniel Kinzler wrote:
"Said to be the same as" is a good example of a case where cycles are unavoidable. A possible workaround in this case is to make sure that the transitive closure of "said to be the same as" is already in the data, such that the path "P460+" returns the same results as a mere "P460" would. It's not ideal, but maybe workable.
I think we have to distinguish between different use cases:
- Antisymmetric transitive relations, like subclass-of or part-of, which should
form an acyclic graph. For these, the "*" notation in sparql can be used to query a sub-graph, such as all kinds of cars or all places in Idaho. This is our primary use case for path traversal, I believe
- Symmetric transitive relations, such as "said to be the same as". These
(should) form small "islands" of fully connected graphs that are (hopefully) unconnected to each other. Here, the "*" notation can be used to include the entire clique instead of only a single node in a query. This might be useful in some cases, but doesn't strike me as a typical use case.
- Cycles in non-transitive properties: these are not errors at all, and
problems only arise when such properties as used in a query as if they were transitive. We could perhaps detect and reject attempts to apply the "*" notation to properties that are not transitive.
- Intransitive symmetrical relations (e.g. "souse of"). Do we need any special
handling for them, or do they just get treated like (3)?
Anyway: we need a solution for (1) that allows transitive queries, and a solution for (3) that prevents pathological behavior. If we get nice handling for case (2), that's a bonus, but not a requirement, I think.