On 22 September 2013 at 21:24:48, Antoine Isaac (aisaac@few.vu.nl) wrote:
First, getting a clean hierarchy won't make things easier, if you end up with a too static/formal view on the world. Second, the feeling about the W3C recommendations is wrong. W3C has actually pushed SKOS to allow 'softer' classifications to be represented having to undergo the ordeals and dangers of RDFS/OWL...
But I realize all this might be regarded as questioning the decision you made earlier on using P31 and P279 instead of the GND type, so I'm going to stop bothering you ;-)
Agreed with some of that.
The primary problem with GND type is that it tried to reduce the whole world into 7 arbitrary categories. I'm not sure how any proposed alternative could be as barmy as that.
My general preference is towards simple, unfussy and bubble-up types, looking at existing systems that work and following them as far as possible (and, no, big formal ontology systems do not satisfy the "that work" part of that sentence, nor do indulgent academic thought experiments).
We don't need big-design-up-front. We need to apply common sense, the Pareto Principle and avoid the excesses of the academic AI/KR community that have thus far made things like the RDF/OWL spec impenetrable to the average person who doesn't know what "model-theoretic semantics" are. (And also left us in a state where we have endless specs on ontological minutiae, but nobody seems to be bothered about fixing datatypes.)
Indeed, one of the things that's good about RDF is precisely that because you use URIs to define properties and classes, you can delegate the creation of those classes and properties to subject matter experts. The biology/medicine people design the schemata they need to represent genes and drugs and so on; if I need a simple property to represent dietary preference, I just coin it and start publishing. On Wikidata, rather than trying to suppose that the ontology people have solved all the problems, it'd be much better if we followed actual usage and unified our semantics with others using things like owl:sameAs and equivalentProperty relations.
If I had to suggest some design principles, these would be where I start:
1. Prioritise pragmatism and common sense over theoretical unity.
2. Categorisation schemes are used by humans and implemented by humans. Design for humans rather than for hyper-intelligent robots or geniuses.
3. Actual usage takes priority over hypothetical use cases.
4. Use by Wikimedia projects takes priority over use by third parties.
5. Optimise for common use cases per Pareto's Principle.
6. You can apply two different types to something. Avoid creating union types. Wikipedia may have "Jewish LGBT scientists from Portugal with a cleft lip", but we don't need to replicate that kind of silliness.
7. If explaining your proposed category/property/schema to the man on the Clapham Omnibus would cause him to laugh to the point where it would disturb his fellow travellers, you need to rethink your proposal.
8. Take your necktie off. You are designing a fancy computer index card system, not going to meet the Queen of England.
The most amusing thing in the GND discussions (beyond the hilarious defences of how the absurd way the GND categorises fictional characters, planets, families and so on is actually okay) were people predicting anarchy if we didn't strictly follow some kind of schema designed by librarians. It's almost as if Wikipedia hadn't happened: the same people would have been saying back in 2001 that an encyclopedia written by random volunteers on the Internet would be impossible and the anarchic dream of pot-smoking hippies.
-- Tom Morris http://tommorris.org/