On 24.10.2015 09:36, James Heald wrote:
On 24/10/2015 00:50, Stas Malyshev wrote:
Hi!
least one Wikipedia) are considered to refer to
equivalent classes on
Wikidata, which could be expressed by a small subclass-of cycle. For
We can do it, but I'd rather we didn't. The reason is that it would
require engine that queries such data (e.g. SPARQL engine) to be
comfortable with cycles in property paths (especially ones with + and
*), and not every one is (Blazegraph for example looks like does not
handle them out of the box). It can be dealt with, I assume, but why
create trouble for ourselves?
It should be a basic requirement of any SPARQL engine that it should be
able to handle path queries that contain cycles.
For example, consider equivalence relationships like P460 "said to be
the same as", which is being used to link given names together.
If we want to find all the names in a particular equivalence class, and
eg rank them by their incidence count, as is done in the 'query' columns at
https://www.wikidata.org/wiki/Wikidata:WikiProject_Names/given-name_variants
then being able to handle cycles in path queries is a basic requirement
for the job.
I agree. Even if we discourage cycles in other cases, there is still no
guarantee that there won't be any, so the engine should be robust
against this.
On the other hand, we have to live with the technical infrastructure we
got. If BlazeGraph does not handle cycles well, we should encourage
their team to work on fixing this, but at the same time we need to work
around the issue for a while.
"Said to be the same as" is a good example of a case where cycles are
unavoidable. A possible workaround in this case is to make sure that the
transitive closure of "said to be the same as" is already in the data,
such that the path "P460+" returns the same results as a mere "P460"
would. It's not ideal, but maybe workable.
Markus