Hi Peter,
The community-defined meaning of *subclass of* (P279) is that of rdfs:subClassOf [1]. Similarly, the community-defined meaning of *instance of* (P31) is that of rdf:type [2, 3].
There are some open problems with how to handle qualifiers on *instance of* and *subclass of* in RDF/OWL exports of P31 as rdf:type and P279 as rdfs:subClassOf, but that does not negate the community's decision to tie its two most basic membership properties to those W3C standard properties. In the current RDF/OWL exports that follow the community interpretation of P31 and P279, e.g. wikidata-taxonomy.nt.gz and wikidata-instances.nt.gz in [4], statements that have qualifiers on either of those properties are simply omitted.
The community's definition of disease is less established. However, there is consensus that diseases like cancer (Q12078) and malaria (Q12156) are classes. An instance of disease would be a particular case of a disease, i.e. a particular case of an abnormal condition in a particular organism. For example, it would be the particular case of throat cancer that caused U.S. President Ulysses S. Grant to die, as reflected in the Wikidata statement "Ulysses S. Grant *cause of death *throat cancer" [5].
Wikidata has no items on actual instances of disease to my knowledge -- although it does have at least one item about an instance of a symptom [6]. That of course does not mean that such instances of disease do not exist or that they could not theoretically be modeled in some local Wikibase installation (e.g. in a physician's office or a hospital) that uses Wikidata vocabulary to track actual instances of disease, e.g. a particular case of pancreatic cancer in a patient.
If you have questions or concerns regarding how diseases are modeled, I would recommend contacting Wikidata editor and disease ontologist Elvira Mitraka (Emitraka) [7], as well as WikiProject Medicine [8] or WikiProject Molecular Biology [9].
Regarding how outsiders can become aware of modeling methodology, I recommend reading https://www.wikidata.org/wiki/Help:Basic_membership_properties and engaging with particular domain modeling groups on Wikidata, e.g. the wikiprojects mentioned above. This mailing list and Wikidata Project Chat [10] are also good places to ask questions.
Finally, regarding your question "Is Wikidata uniform in applying this methodology?", the answer is no. Wikidata's use of *subclass of* and *instance of* varies among (and sometimes within) different domains of knowledge like human occupations, creative work genres, cuisines, and sports. The basic difference in usage among those domains is using *instance of* where others would use *subclass of*.
For example, pizza (https://www.wikidata.org/wiki/Q177) is currently modeled as an instance of food and (transitively) a subclass of food. Problematic indeed! Disease modeling achieves the same goal of easy queryability by making statements like "malaria *subclass of* disease" and "malaria *subclass of* parasitic protozoa infectious disease" [11], where the latter value transitively resolves to disease. This is not only rather redundant, but also makes the *subclass of* hierarchy cyclic and thus not a directed acyclic graph (DAG) due to the situation you note in the item about disease itself. But at least it avoids the more severe problem of being ontologically incorrect as seen in the item on pizza -- and all chemical elements, e.g. hydrogen (Q556) [12].
Regards, Eric https://www.wikidata.org/wiki/User:Emw
1. http://www.w3.org/TR/rdf-schema/#ch_subclassof 2. http://www.w3.org/TR/rdf-schema/#ch_type 3. is a -> instance of. https://www.wikidata.org/w/index.php?title=Property_talk:P31&oldid=25407... 4. http://tools.wmflabs.org/wikidata-exports/rdf/index.php?content=dump_downloa... 5. Ulysses S. Grant: cause of death. https://www.wikidata.org/wiki/Q34836#P509 6. George H. W. Bush vomiting incident. https://www.wikidata.org/wiki/Q5540112 7. https://www.wikidata.org/wiki/User:Emitraka 8. WikiProject Medicine on Wikidata. https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Medicine 9. Wikiproject Molecular Biology on Wikidata. https://www.wikidata.org/wiki/Wikidata:WikiProject_Molecular_biology 10. https://www.wikidata.org/wiki/Wikidata:Project_chat 11. Malaria: subclass of. https://www.wikidata.org/w/index.php?title=Q12156&oldid=259072228#P279 12. Hydrogen. https://www.wikidata.org/w/index.php?title=Q556&oldid=258289050
On Sat, Oct 17, 2015 at 10:25 AM, Peter F. Patel-Schneider < pfpschneider@gmail.com> wrote:
On 10/17/2015 12:55 AM, Thomas Douillard wrote:
I was a bit surprised to see class reasoning used on diseases.
I was not aware of that, do you have links ?
See slide 38 of http://www.slideshare.net/_Emw/an-ambitious-wikidata-tutorial
I was a bit surprised to see class reasoning used on diseases. This depends on a particular modelling methodology.
It's not surprising as the meaning of properties is community defined
(or sub
community defined) so any community can use reasoning technology they
want to
use as which is consistent with the intended meaning of properties. As Wikidata do only stores statements anyone can use reasoning technologies
on
top of this that are community accepted. The drawback of this approach
have
been discussed on another thread some days ago : it could become tricky
to
understand for a simple user the path that lead to a statement addition
and we
have to be careful to always provide informations on which bot added
inferred
statements with that reasoning technology or rule from which data.
What is the community-defined meaning of subclass of and diseases then?
Here is what I see in Wikidata.
https://www.wikidata.org/wiki/Q128581 breast cancer has a https://www.wikidata.org/wiki/Property:P279 subclass of link to both https://www.wikidata.org/wiki/Q12136 disease and https://www.wikidata.org/wiki/Q18556617 thoracic cancer
https://www.wikidata.org/wiki/Property:P279 subclass of is linked via https://www.wikidata.org/wiki/Property:P1628 equivalent property to http://www.w3.org/2000/01/rdf-schema#subClassOf which is the subclass relationship between classes.
https://www.wikidata.org/wiki/Property:P279 subclass of has English description all of these items are instances of those items; this item is a class of that item. Not to be confused with Property:P31 (instance of). which is rather confusing, but appears to be gloss of the RDFS meaning of http://www.w3.org/2000/01/rdf-schema#subClassOf
Someone looking at all this is thus lead to believe that https://www.wikidata.org/wiki/Property:P279 subclass of is the same as the RDFS meaning of http://www.w3.org/2000/01/rdf-schema#subClassOf
So diseases are classes. They then have instances. They can be reasoned with using techniques borrowed from RDFS.
This is a particular modelling methodology. It has its benefits. It requires a certain view of disease and diseases. The particular instantiation of this modelling methodology, where there is a redundant link to the top of the disease hierarchy and that top loops back to itself, has its own benefits and drawbacks.
A bigger problem than the one you state, I think, is how outsiders can determine that this modelling methodology is in place and understand it adequately to effectively use the information or to contribute more information. There is nothing on the discussion pages for the various diseases that I looked at.
The modelling methodology used here is useful in many other places, including human occupations, creative work genres, cuisines, and sports. Is Wikidata uniform in applying this methodology? If this is not the case, then how is the use of this methodology signalled?
I however noticed in heated recent debates that some users on frwiki were sensible to the argument that Wikidata only does store statements. This
kind
of users feared that Wikidata would induce an alignment of semantics of
words
and items to the enwiki semantic, They believes in the linguistic
hypothesis
that words in a language carry some kind of language dependant meaning on their own and feared some kind of "cultural contagion" by some kind of mechanism where the specific meaning of english word would contaminate
the
french word. It has of course been said many time that Wikidata was not focused on words and linguistic but on definitions mainly, and that one definition equals one item, that wikidata was the sum of all knowledge,
but
the argument that finally seemed to be effective was the one that
Wikidata do
only store statements and do not einforce constraint. It seems to be
effective
to convince them that Wikidata is indeed POV agnostic.
In my discussion above, I tried to stay away from using the human-language descriptions, preferring an external formal definition. Unfortunately, Wikidata does not have an internal formal definition beyond the simple description of the data structures. This lack, I think, is what makes the human-language descriptions so important in Wikidata. My view is that a stronger formal basis for Wikidata would help to reduce the possibility that descriptions in dominant human languages do indeed push out the other descriptions.
2015-10-16 19:14 GMT+02:00 Peter F. Patel-Schneider <
pfpschneider@gmail.com
mailto:pfpschneider@gmail.com>:
It's very pleasant to hear from someone else who thinks of Wikidata
as a
knowledge base (or at least hopes that Wikidata can be considered as
a
knowledge base). Did you get any pushback on this or on your stated
Wikidata
goal of structuring the sum of all human knowledge? Did you get any pushback on your section on classification in
Wikidata? It
seems to me that some of that is rather controversial in the Wikidata community. I was a bit surprised to see class reasoning used on
diseases.
This depends on a particular modelling methodology. peter On 10/12/2015 11:47 AM, Emw wrote: > Hi all, > > On Saturday, I facilitated a workshop at the U.S. National
Archives entitled
> "An Ambitious Wikidata Tutorial" as part of WikiConference USA
> > Slides are available at: > http://www.slideshare.net/_Emw/an-ambitious-wikidata-tutorial >
https://commons.wikimedia.org/wiki/File:An_Ambitious_Wikidata_Tutorial.pdf
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata