Hi Stas,
Yes, P31 is always rdf:type and P279 is always rdfs:subClassOf in RDF/OWL
exports that use the community interpretation of those properties. If
those exports want to be decidable then they'll need to omit claims that
use P31 or P279 as qualifiers -- which some of Markus's RDF/OWL exports
already do, as mentioned in my previous message.
Even if exports don't do that, they would still be in conformance with RDF,
RDFS, and OWL 2 Full. But they would not be valid OWL 2 DL, which is what
most target when they want to query large ontologies because of the
performance guarantee of decidability.
If we want to pick up those items in OWL 2 DL exports, then the solution is
simple: don't use qualifiers in *instance of* (P31) or *subclass of* (P279)
claims. We could capture that information in other ways, e.g. by using
non-P31/P279 properties for it on Wikidata. This would probably be a good
idea regardless of OWL 2 DL conformance.
same thing can not be an individual and a class
That is incorrect. That old restriction of OWL 1 DL has been negated by
OWL 2 DL, which was released in 2012. This enables class-individual
punning while maintaining decidability [2]. So we could say something like
"subclass of: Homo" and "instance of: taxon" in the item about human
(Q5)
while ensuring that queries that use the community-agreed W3C semantics
could theoretically terminate. This is a big deal.
if we adhere to the strict notion that pizza is only a subclass, then we
would practically never have any instances in the
database for wide
categories of things
Why would it be a problem to not have instances in Wikidata for wide
categories of things? Let's consider what would happen if we don't
explicitly model classes as instances, and thus don't have any explicit
instances for a wide category of things where Wikidata does not have items
for "strictly interpreted" (i.e. actual) instances.
We would not only still have items and tons of meaningful data on those
items -- e.g. hydrogen and cancer -- but omitting *instance of* there would
also help ensure ontological correctness and easy interoperability with
many major third-party ontologies.
In fact, this is how a vast swath of ontologies already are -- they are
often mostly comprised of classes and very few or no instances. This is
the case with the Disease Ontology, a vetted third-party ontology which is
currently being used as the semantic backbone for diseases on Wikidata [3],
as well as ontologies like ChEBI for chemistry and many other scientific
ontologies. See also how the Stanford group that develops Protege models
pizza as a class and not an instance:
http://protege.stanford.edu/ontologies/pizza/pizza.owl.
There is an extra layer here to consider. Pizza could be explicitly
modeled as both a class and an instance, albeit rather awkwardly. Similar
to how we could say "Porsche 356 *subclass of* car" and "Porsche 356
*instance
of* car model", we could also theoretically state something like
"pizza *subclass
of* food" and "pizza *instance of* food class" or somesuch.
That is called metamodeling. Opinions differ within the Wikidata community
on how widely such explicit metamodeling should be used, but there is
consensus that statements like "pizza *instance of* food" and "Porsche
356 *instance
of* car" are incorrect and not suitable for Wikidata.
for most people, pizza is a food, not a "subclass of food"
The phrase "is a" is in no way mutually exclusive with "subclass of".
"Is
a" is ambiguous -- it can mean the subject is either a class or an
instance. In other words, "is a" can mean either *instance of* (P31)
or *subclass
of* (P279).
New Wikidata editors often oversimplify *instance of* (P31) to "is a"
because P31 is so widely used where the everyday phrase "is a" fits.
However, in many ontologies, like Disease Ontology or any of the other Open
Biomedical Ontologies (OBO) [3], *is_a* actually resolves to
rdfs:subClassOf, i.e. *subclass of* (P279). To avoid confusion, when
talking in an ontological context as we do with Wikidata classification,
it's best to avoid ambiguous phrases like "is a" and favor more precise
phrases like "instance of" and "subclass of".
that it is *already* what the consensus on Wikidata is
That "consensus" directly conflicts with other consensuses which have
established that chemical compounds, diseases, and genes should use *subclass
of* instead of *instance of*. Wikidata should not be a disjointed
patchwork of knowledge fiefdoms where each community has its own insular,
incompatible usage of *subclass of* and *instance of*. This problem is
especially acute in chemistry on Wikidata, where chemical elements use
"*instance
of *chemical element*" *even though it has been established that chemical
compounds should not use "*instance of* chemical compound" [4].
More importantly, having "*instance of *chemical element" and
(transitively) "subclass of *chemical element*" in items as we do now is
ontologically incorrect. Hydrogen (Q556) has an example of such modeling.
That state of affairs has been widely recognized as an error in other
discussions, e.g. the "Item both instance and subclass" thread that Markus
and Denny chimed in on September 2014 [5].
Eric
1.
http://www.w3.org/TR/owl2-new-features/#F12:_Punning
2.
http://www.comlab.ox.ac.uk/people/boris.motik/pubs/motik07metamodeling-jour…
3.
http://www.obofoundry.org/
4.
https://lists.wikimedia.org/pipermail/wikidata-l/2014-October/004695.html
5.
https://lists.wikimedia.org/pipermail/wikidata-l/2014-September/004650.html
On Sun, Oct 18, 2015 at 4:59 PM, Stas Malyshev <smalyshev(a)wikimedia.org
wrote:
> Hi!
> > The community-defined meaning of
/subclass of/ (P279) is that of
> > rdfs:subClassOf [1]. Similarly, the community-defined meaning of
> > /instance of/ (P31) is that of rdf:type [2, 3].
> Are you sure it is always correct? AFAIK
there are some specific rules
> and meanings in OWL that classes should adhere to, also same thing can
> not be an individual and a class, and others (not completely sure of the
> whole list, as I don't have enough background in RDF/OWL). But I'm not
> sure existing data actually follows that.
> > There are some open problems with how
to handle qualifiers on /instance
> > of/ and /subclass of/ in RDF/OWL exports of P31 as rdf:type and P279 as
> > rdfs:subClassOf, but that does not negate the community's decision to
> > tie its two most basic membership properties to those W3C standard
> > properties. In the current RDF/OWL exports that follow the community
> I'm not sure I understand how that
works in practice. I.e., if we say
> that P31 *is* rdf:type, then it can't be qualified in RDF/OWL and we can
> not represent part (albeit small, qualified properties are about 0.2% of
> all such properties) of our data.
> I mean, we can certainly have data sets
which include P31 statements
> from the data translated to rdf:type unless they have qualifiers, and
> that can be very useful pragmatically, no question about it. But can we
> really say P31 is the same as rdf:type and use it whenever we choose to
> represent Wikidata data as RDF? I'm not sure about that.
> > For example, pizza
(
https://www.wikidata.org/wiki/Q177) is currently
> > modeled as an instance of food and (transitively) a subclass of food.
> Here we have another practical issue - if
we adhere to the strict notion
> that pizza is only a subclass, then we would practically never have any
> instances in the database for wide categories of things. I.e. since a
> particular food item is rarely notable enough to be featured in
> Wikidata, no food would have instances. It may be formally correct but
> I'm afraid it's not like most people think - for most people, pizza is a
> food, not a "subclass of food". Same with chemistry - as virtually no
> actual physical chemical compound (as in "this brown liquid in my test
> tube I prepared this morning by mixing contents of those three other
> test tubes") of would be notable enough to gain entry in Wikidata,
> nothing in chemistry would ever be an instance. Theoretically it may be
> sound, but practically I'm not sure it would work well, even more - that
> it is *already* what the consensus on Wikidata is.
> --
> Stas Malyshev
> smalyshev(a)wikimedia.org
>
_______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata