On 29.10.2015 05:41, Benjamin Good wrote:
For what its worth, I tend to agree with Peter here. It makes sense to me to add constraints akin to 'disjoint with' at the class level.
+1 for having this. This does not preclude to have an additional mechanism on the instance level if needed to augment the main thing, but the classes are an easier way to start.
This can also help with detecting other issues that are unrelated to merging. For instance, nothing should be an event and an airplane at the same time.
We need a common approach on how to deal with ambiguous Wikipedia articles. One option would be to create an "auxiliary" item that is not linked to Wikipedia in such a case, but that is used to represent some aspects of the "main" item that would otherwise be incompatible.
Benjamin is right that these issues are not specific to the bio domain. It's rather the opposite: the bio domain is one of the domains that is advanced enough to notice these problems ...
The problem I see is that we don't exactly have classes here as the term is used elsewhere. I guess in wikidata, a 'class' is any entity that happens to be used in a subclassOf claim ?
In this case, one can leave this to the user: two items that are specified to be disjoint classes are classes.
In the Wikidata Taxonomy Browser, we consider items as classes if one of the following is true: (1) they have a "subclass of" statement (2) they are the target of a "subclass of" statement (3) they are the target of an "instance of" statement
We then (mostly) ignore the classes that do not have own instances or own subclasses (the "leafs" in the taxonomy), since these are very many: * The above criterion leads to over 200,000 class items. * Only about 20,000 of them have instances or subclasses.
Another way forward could be to do this using properties rather than classes. I think this could allow use to use the constraint-checking infrastructure that is already in place? You could add a constraint on a property that it is 'incompatible with' another property. In the protein/gene case we could pragmatically use Property:P351 (entrez gene id), incompatible with Property:P352 (uniprot gene id). More semantically, we could use 'encoded by' incompatible-with 'encodes' or 'genomic start'
I think the constraint checking infrastructure should be able to handle both approaches equally well. If "disjoint with" is a statement, one could even check this constraint in SPARQL (possibly further restricting to query only for constraint violations in a particular domain).
Cheers,
Markus
On Wed, Oct 28, 2015 at 5:08 PM, Peter F. Patel-Schneider