On 23.10.2015 20:19, Stas Malyshev wrote:
Hi!
I am happy to announce a new tool [1], written by Serge Stratan, which allows you to browse the taxonomy (subclass of & instance of relations) between Wikidata's most important class items. For example, here is the Wikidata taxonomy for Pizza (discussed recently on this list):
Very nice! One specially interesting to me is cycle detection - I just discovered there are many (as in, hundreds) cycles in various important properties such as P131 which break a lot of queries, and was wondering what could be done to expose/eliminate those better.
Yes, this should indeed be fixed, although it is theoretically possible that two distinct items (with two distinct Wikipedia articles on at least one Wikipedia) are considered to refer to equivalent classes on Wikidata, which could be expressed by a small subclass-of cycle. For example, it is really not clear to me what the difference between some of our upper level classes is. For instance, what is an example of an Entity that is not an Object? (Note that Object subsumes intangible things such as Method and Activity). Maybe a better way to deal with this situation would be to avoid using one of the items as a class altogether, so that no cycle is needed.
We also have/had cycles involving instance-of, which is definitely an error. ;-)
Counting cycles in SPARQL should be tricky. There are many distinct cyclic paths through a graph with a few cycles, even if only a small number of relations is causing them. SPARQL does not count paths as such, but one would probably count nodes that are located on a path, which would still consider all possible cyclic paths. The Taxonomy Browser is looking for shortest cycles, but you cannot do this in SPARQL. This might be why there are only 11 cycles detected there.
Markus