On 29.10.2015 05:41, Benjamin Good wrote:
For what its worth, I tend to agree with Peter here.
It makes sense to
me to add constraints akin to 'disjoint with' at the class level.
+1 for having this. This does not preclude to have an additional
mechanism on the instance level if needed to augment the main thing, but
the classes are an easier way to start.
This can also help with detecting other issues that are unrelated to
merging. For instance, nothing should be an event and an airplane at the
We need a common approach on how to deal with ambiguous Wikipedia
articles. One option would be to create an "auxiliary" item that is not
linked to Wikipedia in such a case, but that is used to represent some
aspects of the "main" item that would otherwise be incompatible.
Benjamin is right that these issues are not specific to the bio domain.
It's rather the opposite: the bio domain is one of the domains that is
advanced enough to notice these problems ...
problem I see is that we don't exactly have classes here as the term is
used elsewhere. I guess in wikidata, a 'class' is any entity that
happens to be used in a subclassOf claim ?
In this case, one can leave this to the user: two items that are
specified to be disjoint classes are classes.
In the Wikidata Taxonomy Browser, we consider items as classes if one of
the following is true:
(1) they have a "subclass of" statement
(2) they are the target of a "subclass of" statement
(3) they are the target of an "instance of" statement
We then (mostly) ignore the classes that do not have own instances or
own subclasses (the "leafs" in the taxonomy), since these are very many:
* The above criterion leads to over 200,000 class items.
* Only about 20,000 of them have instances or subclasses.
Another way forward could be to do this using properties rather than
classes. I think this could allow use to use the constraint-checking
infrastructure that is already in place? You could add a constraint on
a property that it is 'incompatible with' another property. In the
protein/gene case we could pragmatically use Property:P351 (entrez gene
id), incompatible with Property:P352 (uniprot gene id). More
semantically, we could use 'encoded by' incompatible-with 'encodes' or
I think the constraint checking infrastructure should be able to handle
both approaches equally well. If "disjoint with" is a statement, one
could even check this constraint in SPARQL (possibly further restricting
to query only for constraint violations in a particular domain).
On Wed, Oct 28, 2015 at 5:08 PM, Peter F. Patel-Schneider