On 23.10.2015 20:19, Stas Malyshev wrote:
Hi!
I am happy to announce a new tool [1],
written by Serge Stratan,
which allows you to browse the taxonomy (subclass of & instance of
relations) between Wikidata's most important class items. For
example, here is the Wikidata taxonomy for Pizza (discussed recently
on this list):
Very nice! One specially interesting to me is cycle detection - I just
discovered there are many (as in, hundreds) cycles in various important
properties such as P131 which break a lot of queries, and was wondering
what could be done to expose/eliminate those better.
Yes, this should indeed be fixed, although it is theoretically possible
that two distinct items (with two distinct Wikipedia articles on at
least one Wikipedia) are considered to refer to equivalent classes on
Wikidata, which could be expressed by a small subclass-of cycle. For
example, it is really not clear to me what the difference between some
of our upper level classes is. For instance, what is an example of an
Entity that is not an Object? (Note that Object subsumes intangible
things such as Method and Activity). Maybe a better way to deal with
this situation would be to avoid using one of the items as a class
altogether, so that no cycle is needed.
We also have/had cycles involving instance-of, which is definitely an
error. ;-)
Counting cycles in SPARQL should be tricky. There are many distinct
cyclic paths through a graph with a few cycles, even if only a small
number of relations is causing them. SPARQL does not count paths as
such, but one would probably count nodes that are located on a path,
which would still consider all possible cyclic paths. The Taxonomy
Browser is looking for shortest cycles, but you cannot do this in
SPARQL. This might be why there are only 11 cycles detected there.
Markus