Thus, in order to formally classify ethanol itself as opposed to some particular ethanol molecule, we must say for an item like http://www.wikidata.org/wiki/Q153: "ethanol subclass of chemical compound" and not "ethanol instance of chemical compound". (On Wikidata, the statement is more precisely "ethanol subclass of alcohol", but it is entailed from the statements "alcohol subclass of organic compound" and "organic compound subclass of chemical compound" that "ethanol subclass of chemical compound".)A key difference between talk about cars and talk about chemicals is that, with cars, we have familiar terms like "car model" that distinguish concrete instances (that particular car you see on the street) from abstract "instances" (i.e. metaclasses, classes that are also instances, the kind of car that you see on the street). We do not have a well-known term like "chemical model" or "chemical compound type" to distinguish classes (types) of chemicals and instances (tokens) of chemicals. When one speaks of the properties of ethanol or hydrogen, it is understood that the subject is all concrete, particular, spatiotemporal tokens, i.e. instances of ethanol and hydrogen -- not just a specific ethanol molecule floating in that container before you on a Saturday with friends, but all molecules that we label "ethanol" everywhere.The statement "ethanol instance of chemical compound" is ontologically incorrect. Importantly, it is also incompatible with ChEBI, the most widely-used chemistry ontology.The matter of how to apply instance of (P31, rdf:type) and subclass of (P279, rdfs:subClassOf) on Wikidata in relation to chemical entities has been, as Thomas puts it, a long discussion [1-5]. Hopefully with a wider audience and experts like Markus Krötzsch and Denny Vrandečić now interested, we can come to a resolution at least in the particular domain of chemical compounds. Since it concerns interoperability with another large Semantic Web project, I have copied Janna Hastings and Alan Ruttenberg on this discussion. Janna coordinates ChEBI. Alan coordinates BFO, the upper ontology used by ChEBI and many other major ontologies in the natural sciences, like Gene Ontology and Disease Ontology.
Denny indicates how the statement "Porsche 356 instance of car" would be incorrect in Wikidata even though "Porsche 356 is a car" is acceptable in everyday speech. Similarly, "ethanol instance of chemical compound" is incorrect in Wikidata even though "ethanol is a chemical compound" is acceptable in less formal contexts.A common defense of statements like "ethanol instance of chemical compound" is that Wikidata will never have items about any concrete molecules of ethanol, so, since ethanol is a "leaf node" in our concept taxonomy, it makes sense to state that ethanol is an instance. That interpretation of "instance" is short-sighted. It precludes us from ever talking about particular tokens of ethanol, or particular aggregates of such objects, without overhauling our chemistry ontology. Excluding consideration of metaclasses like "chemical compound type", the fact that an entity is a leaf node in a concept hierarchy is a necessary but not sufficient condition for using instance of.Another common suggestion is that we should state something like "ethanol instance of chemical compound type" and "ethanol subclass of chemical compound".
To see where that gets us, try wrapping your head around this: https://commons.wikimedia.org/wiki/File:Atom_classes.svg. Really, take a look. If we want Wikidata's concept hierarchy to be seen as of dauntingly complex, pervasively applying that kind of three-layer classification scheme will do.The kind of explicit metamodeling seen when punning things like cars and car models, ships and ship classes, biological taxa and organisms, etc. works reasonably well in certain domains. But, while we hold that hammer in one hand, we should be careful not to see everything as a nail. Outside domains that have established vocabulary for metaclasses, imposing explicit metamodeling with statements like "ethanol instance of chemical compound type" or "hydrogen instance of atom type" will strike users as unduly complex.
Without such metamodeling, though, querying for a list of chemical compounds becomes murkier. Surely we would want to return "ethanol" and not "organic compound" in such a list. How about "alcohol"? Relatedly, if we don't state "oxygen instance of chemical element", then how can we easily query for all the elements in the Periodic Table of Elements without including in the results of any potential subclasses of oxygen (e.g., isotopes of oxygen like oxygen-16, oxygen-17, etc.)?There are ways to achieve that in SPARQL using rdfs:subClassOf / P279 / subclass of, but they require adhering to certain conventions. When faced with requiring many potential query users to learn some Wikidata MetaObject Protocol, though, I'm inclined to make some sacrifices for simplicity, ontological correctness, and consistency with major existing ontologies.In summary, this ball has punted for over a year now. Because of the impasse in how to classify chemical entities, we now have showcase items that have obvious problems like entailing that something is both a class and an instance of chemical compound. We need input from a wider group of people knowledgeable about ontology or chemistry, ideally both. Hopefully with a Wikimedian in Residence at the Royal Society of Chemistry [6] we'll get some more focused resources on this. All major scientific ontologies use subclass of (rdfs:subClassOf), not instance of (rdf:type), to classify such things. In my opinion, Wikidata should maintain technical and philosophical compatibility with ontologies like ChEBI and remove statements like "ethanol instance of chemical compound". This would improve interoperability between Wikidata and the rest of the Semantic Web.1. https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/07#Forth_and_back_conversions_of_items_between_class_and_instance
2. https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/05#chemical_element.3. https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Chemistry#Germanium_subclass_tree.
4. https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/07#Subclass_of_two_different_things
5. https://www.wikidata.org/wiki/Help_talk:Basic_membership_properties#Proposition_of_definition
6. http://pigsonthewing.org.uk/wikimedian-residence-royal-society-chemistry/
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l