Hi Peter,

The community-defined meaning of subclass of (P279) is that of rdfs:subClassOf [1].  Similarly, the community-defined meaning of instance of (P31) is that of rdf:type [2, 3]. 

There are some open problems with how to handle qualifiers on instance of and subclass of in RDF/OWL exports of P31 as rdf:type and P279 as rdfs:subClassOf, but that does not negate the community's decision to tie its two most basic membership properties to those W3C standard properties.  In the current RDF/OWL exports that follow the community interpretation of P31 and P279, e.g. wikidata-taxonomy.nt.gz and wikidata-instances.nt.gz in [4], statements that have qualifiers on either of those properties are simply omitted.

The community's definition of disease is less established.  However, there is consensus that diseases like cancer (Q12078) and malaria (Q12156) are classes.  An instance of disease would be a particular case of a disease, i.e. a particular case of an abnormal condition in a particular organism.  For example, it would be the particular case of throat cancer that caused U.S. President Ulysses S. Grant to die, as reflected in the Wikidata statement "Ulysses S. Grant cause of death throat cancer" [5]. 

Wikidata has no items on actual instances of disease to my knowledge -- although it does have at least one item about an instance of a symptom [6].  That of course does not mean that such instances of disease do not exist or that they could not theoretically be modeled in some local Wikibase installation (e.g. in a physician's office or a hospital) that uses Wikidata vocabulary to track actual instances of disease, e.g. a particular case of pancreatic cancer in a patient.

If you have questions or concerns regarding how diseases are modeled, I would recommend contacting Wikidata editor and disease ontologist Elvira Mitraka (Emitraka) [7], as well as WikiProject Medicine [8] or WikiProject Molecular Biology [9]. 

Regarding how outsiders can become aware of modeling methodology, I recommend reading https://www.wikidata.org/wiki/Help:Basic_membership_properties and engaging with particular domain modeling groups on Wikidata, e.g. the wikiprojects mentioned above.  This mailing list and Wikidata Project Chat [10] are also good places to ask questions.

Finally, regarding your question "Is Wikidata uniform in applying this methodology?", the answer is no.  Wikidata's use of subclass of and instance of varies among (and sometimes within) different domains of knowledge like human occupations, creative work genres, cuisines, and sports.  The basic difference in usage among those domains is using instance of where others would use subclass of.

For example, pizza (https://www.wikidata.org/wiki/Q177) is currently modeled as an instance of food and (transitively) a subclass of food.  Problematic indeed!  Disease modeling achieves the same goal of easy queryability by making statements like "malaria subclass of disease" and "malaria subclass of parasitic protozoa infectious disease" [11], where the latter value transitively resolves to disease.  This is not only rather redundant, but also makes the subclass of hierarchy cyclic and thus not a directed acyclic graph (DAG) due to the situation you note in the item about disease itself.  But at least it avoids the more severe problem of being ontologically incorrect as seen in the item on pizza -- and all chemical elements, e.g. hydrogen (Q556) [12].


1.  http://www.w3.org/TR/rdf-schema/#ch_subclassof
2.  http://www.w3.org/TR/rdf-schema/#ch_type
3.  is a -> instance of.  https://www.wikidata.org/w/index.php?title=Property_talk:P31&oldid=254073736#is_a_-.3E_instance_of
4.  http://tools.wmflabs.org/wikidata-exports/rdf/index.php?content=dump_download.php&dump=20150928
5.  Ulysses S. Grant: cause of death.  https://www.wikidata.org/wiki/Q34836#P509
6.  George H. W. Bush vomiting incident.  https://www.wikidata.org/wiki/Q5540112
7.  https://www.wikidata.org/wiki/User:Emitraka
8.  WikiProject Medicine on Wikidata.  https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Medicine
9.  Wikiproject Molecular Biology on Wikidata.  https://www.wikidata.org/wiki/Wikidata:WikiProject_Molecular_biology
10.  https://www.wikidata.org/wiki/Wikidata:Project_chat
11.  Malaria: subclass of.  https://www.wikidata.org/w/index.php?title=Q12156&oldid=259072228#P279
12.  Hydrogen.  https://www.wikidata.org/w/index.php?title=Q556&oldid=258289050

On Sat, Oct 17, 2015 at 10:25 AM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
On 10/17/2015 12:55 AM, Thomas Douillard wrote:
>> I was a bit surprised to see class reasoning used on diseases.
> I was not aware of that, do you have links ?

See slide 38 of http://www.slideshare.net/_Emw/an-ambitious-wikidata-tutorial

>> I was a bit surprised to see class reasoning used on diseases.
>> This depends on a particular modelling methodology.
> It's not surprising as the meaning of properties is community defined (or sub
> community defined) so any community can use reasoning technology they want to
> use as which is consistent with the intended meaning of properties. As
> Wikidata do only stores statements anyone can use reasoning technologies on
> top of this that are community accepted. The drawback of this approach have
> been discussed on another thread some days ago : it could become tricky to
> understand for a simple user the path that lead to a statement addition and we
> have to be careful to always provide informations on which bot added inferred
> statements with that reasoning technology or rule from which data.

What is the community-defined meaning of subclass of and diseases then?

Here is what I see in Wikidata.

https://www.wikidata.org/wiki/Q128581 breast cancer
has a
https://www.wikidata.org/wiki/Property:P279 subclass of
link to both
https://www.wikidata.org/wiki/Q12136 disease
https://www.wikidata.org/wiki/Q18556617 thoracic cancer

https://www.wikidata.org/wiki/Property:P279 subclass of
is linked via
https://www.wikidata.org/wiki/Property:P1628 equivalent property
which is the subclass relationship between classes.

https://www.wikidata.org/wiki/Property:P279 subclass of
has English description
all of these items are instances of those items; this item is a class of that
item. Not to be confused with Property:P31 (instance of).
which is rather confusing, but appears to be gloss of the RDFS meaning of

Someone looking at all this is thus lead to believe that
https://www.wikidata.org/wiki/Property:P279 subclass of
is the same as the RDFS meaning of

So diseases are classes.  They then have instances.  They can be reasoned with
using techniques borrowed from RDFS.

This is a particular modelling methodology.  It has its benefits.  It requires
a certain view of disease and diseases.  The particular instantiation of this
modelling methodology, where there is a redundant link to the top of the
disease hierarchy and that top loops back to itself, has its own benefits and

A bigger problem than the one you state, I think, is how outsiders can
determine that this modelling methodology is in place and understand it
adequately to effectively use the information or to contribute more
information.  There is nothing on the discussion pages for the various
diseases that I looked at.

The modelling methodology used here is useful in many other places, including
human occupations, creative work genres, cuisines, and sports.   Is Wikidata
uniform in applying this methodology?  If this is not the case, then how is
the use of this methodology signalled?

> I however noticed in heated recent debates that some users on frwiki were
> sensible to the argument that Wikidata only does store statements. This kind
> of users feared that Wikidata would induce an alignment of semantics of words
> and items to the enwiki semantic, They believes in the linguistic hypothesis
> that words in a language carry some kind of language dependant meaning on
> their own and feared some kind of "cultural contagion" by some kind of
> mechanism where the specific meaning of english word would contaminate the
> french word. It has of course been said many time that Wikidata was not
> focused on words and linguistic but on definitions mainly, and that one
> definition equals one item, that wikidata was the sum of all knowledge, but
> the argument that finally seemed to be effective was the one that Wikidata do
> only store statements and do not einforce constraint. It seems to be effective
> to convince them that Wikidata is indeed POV agnostic.

In my discussion above, I tried to stay away from using the human-language
descriptions, preferring an external formal definition.  Unfortunately,
Wikidata does not have an internal formal definition beyond the simple
description of the data structures.  This lack, I think, is what makes the
human-language descriptions so important in Wikidata.  My view is that a
stronger formal basis for Wikidata would help to reduce the possibility that
descriptions in dominant human languages do indeed push out the other

> 2015-10-16 19:14 GMT+02:00 Peter F. Patel-Schneider <pfpschneider@gmail.com
> <mailto:pfpschneider@gmail.com>>:
>     It's very pleasant to hear from someone else who thinks of Wikidata as a
>     knowledge base (or at least hopes that Wikidata can be considered as a
>     knowledge base).  Did you get any pushback on this or on your stated Wikidata
>     goal of structuring the sum of all human knowledge?
>     Did you get any pushback on your section on classification in Wikidata?  It
>     seems to me that some of that is rather controversial in the Wikidata
>     community.  I was a bit surprised to see class reasoning used on diseases.
>     This depends on a particular modelling methodology.
>     peter
>     On 10/12/2015 11:47 AM, Emw wrote:
>     > Hi all,
>     >
>     > On Saturday, I facilitated a workshop at the U.S. National Archives entitled
>     > "An Ambitious Wikidata Tutorial" as part of WikiConference USA 2015.
>     >
>     > Slides are available at:
>     > http://www.slideshare.net/_Emw/an-ambitious-wikidata-tutorial
>     > https://commons.wikimedia.org/wiki/File:An_Ambitious_Wikidata_Tutorial.pdf

Wikidata mailing list