Hi Stas,

Yes, P31 is always rdf:type and P279 is always rdfs:subClassOf in RDF/OWL exports that use the community interpretation of those properties.  If those exports want to be decidable then they'll need to omit claims that use P31 or P279 as qualifiers -- which some of Markus's RDF/OWL exports already do, as mentioned in my previous message. 

Even if exports don't do that, they would still be in conformance with RDF, RDFS, and OWL 2 Full.  But they would not be valid OWL 2 DL, which is what most target when they want to query large ontologies because of the performance guarantee of decidability.

If we want to pick up those items in OWL 2 DL exports, then the solution is simple: don't use qualifiers in instance of (P31) or subclass of (P279) claims.  We could capture that information in other ways, e.g. by using non-P31/P279 properties for it on Wikidata.  This would probably be a good idea regardless of OWL 2 DL conformance.

same thing can not be an individual and a class

That is incorrect.  That old restriction of OWL 1 DL has been negated by OWL 2 DL, which was released in 2012.  This enables class-individual punning while maintaining decidability [2].  So we could say something like "subclass of: Homo" and "instance of: taxon" in the item about human (Q5) while ensuring that queries that use the community-agreed W3C semantics could theoretically terminate.  This is a big deal.

if we adhere to the strict notion that pizza is only a subclass, then we would practically never have any instances in the database for wide categories of things

Why would it be a problem to not have instances in Wikidata for wide categories of things?  Let's consider what would happen if we don't explicitly model classes as instances, and thus don't have any explicit instances for a wide category of things where Wikidata does not have items for "strictly interpreted" (i.e. actual) instances. 

We would not only still have items and tons of meaningful data on those items -- e.g. hydrogen and cancer -- but omitting instance of there would also help ensure ontological correctness and easy interoperability with many major third-party ontologies. 

In fact, this is how a vast swath of ontologies already are -- they are often mostly comprised of classes and very few or no instances.  This is the case with the Disease Ontology, a vetted third-party ontology which is currently being used as the semantic backbone for diseases on Wikidata [3], as well as ontologies like ChEBI for chemistry and many other scientific ontologies.  See also how the Stanford group that develops Protege models pizza as a class and not an instance: http://protege.stanford.edu/ontologies/pizza/pizza.owl

There is an extra layer here to consider.  Pizza could be explicitly modeled as both a class and an instance, albeit rather awkwardly.  Similar to how we could say "Porsche 356 subclass of car" and "Porsche 356 instance of car model", we could also theoretically state something like "pizza subclass of food" and "pizza instance of food class" or somesuch. 

That is called metamodeling.  Opinions differ within the Wikidata community on how widely such explicit metamodeling should be used, but there is consensus that statements like "pizza instance of food" and "Porsche 356 instance of car" are incorrect and not suitable for Wikidata.

for most people, pizza is a food, not a "subclass of food"

The phrase "is a" is in no way mutually exclusive with "subclass of".  "Is a" is ambiguous -- it can mean the subject is either a class or an instance.  In other words, "is a" can mean either instance of (P31) or subclass of (P279). 

New Wikidata editors often oversimplify instance of (P31) to "is a" because P31 is so widely used where the everyday phrase "is a" fits.  However, in many ontologies, like Disease Ontology or any of the other Open Biomedical Ontologies (OBO) [3], is_a actually resolves to rdfs:subClassOf, i.e. subclass of (P279).  To avoid confusion, when talking in an ontological context as we do with Wikidata classification, it's best to avoid ambiguous phrases like "is a" and favor more precise phrases like "instance of" and "subclass of". 

that it is *already* what the consensus on Wikidata is

That "consensus" directly conflicts with other consensuses which have established that chemical compounds, diseases, and genes should use subclass of instead of instance of.  Wikidata should not be a disjointed patchwork of knowledge fiefdoms where each community has its own insular, incompatible usage of subclass of and instance of.  This problem is especially acute in chemistry on Wikidata, where chemical elements use "instance of chemical element" even though it has been established that chemical compounds should not use "instance of chemical compound" [4]. 

More importantly, having "instance of chemical element" and (transitively) "subclass of chemical element" in items as we do now is ontologically incorrect.  Hydrogen (Q556) has an example of such modeling.  That state of affairs has been widely recognized as an error in other discussions, e.g. the "Item both instance and subclass" thread that Markus and Denny chimed in on September 2014 [5]. 


1.  http://www.w3.org/TR/owl2-new-features/#F12:_Punning
2.  http://www.comlab.ox.ac.uk/people/boris.motik/pubs/motik07metamodeling-journal.pdf
3.  http://www.obofoundry.org/
4.  https://lists.wikimedia.org/pipermail/wikidata-l/2014-October/004695.html
5.  https://lists.wikimedia.org/pipermail/wikidata-l/2014-September/004650.html

On Sun, Oct 18, 2015 at 4:59 PM, Stas Malyshev <smalyshev@wikimedia.org> wrote:

> The community-defined meaning of /subclass of/ (P279) is that of
> rdfs:subClassOf [1].  Similarly, the community-defined meaning of
> /instance of/ (P31) is that of rdf:type [2, 3].

Are you sure it is always correct? AFAIK there are some specific rules
and meanings in OWL that classes should adhere to, also same thing can
not be an individual and a class, and others (not completely sure of the
whole list, as I don't have enough background in RDF/OWL). But I'm not
sure existing data actually follows that.

> There are some open problems with how to handle qualifiers on /instance
> of/ and /subclass of/ in RDF/OWL exports of P31 as rdf:type and P279 as
> rdfs:subClassOf, but that does not negate the community's decision to
> tie its two most basic membership properties to those W3C standard
> properties.  In the current RDF/OWL exports that follow the community

I'm not sure I understand how that works in practice. I.e., if we say
that P31 *is* rdf:type, then it can't be qualified in RDF/OWL and we can
not represent part (albeit small, qualified properties are about 0.2% of
all such properties) of our data.

I mean, we can certainly have data sets which include P31 statements
from the data translated to rdf:type unless they have qualifiers, and
that can be very useful pragmatically, no question about it. But can we
really say P31 is the same as rdf:type and use it whenever we choose to
represent Wikidata data as RDF? I'm not sure about that.

> For example, pizza (https://www.wikidata.org/wiki/Q177) is currently
> modeled as an instance of food and (transitively) a subclass of food.

Here we have another practical issue - if we adhere to the strict notion
that pizza is only a subclass, then we would practically never have any
instances in the database for wide categories of things. I.e. since a
particular food item is rarely notable enough to be featured in
Wikidata, no food would have instances. It may be formally correct but
I'm afraid it's not like most people think - for most people, pizza is a
food, not a "subclass of food". Same with chemistry - as virtually no
actual physical chemical compound (as in "this brown liquid in my test
tube I prepared this morning by mixing contents of those three other
test tubes") of would be notable enough to gain entry in Wikidata,
nothing in chemistry would ever be an instance. Theoretically it may be
sound, but practically I'm not sure it would work well, even more - that
it is *already* what the consensus on Wikidata is.

Stas Malyshev

Wikidata mailing list