Hi,
I got interested in subclass of (P279) and instance of (P31) statements
recently. I was surprised by two things:
(1) There are quite a lot of subclass of statements: tenth of thousands.
(2) Many of them make a lot of sense, and (in particular) are not
(obvious) copies of Wikipedia categories.
My big question is: who is creating all these statements and how is this
done? It seems too much data to be created manually, but I don't see
obvious automated approaches either (and there are usually no references
given).
I also found some rare issues. "A subclass of B" should be read as
"Every A is also a B". For example, we have "Every piano (Q5994) is also
a keyboard instrument (Q52954)". Overall, the great majority of cases I
looked at had remarkably sane modelling (which reinforces my big question).
But there are still cases where "subclass of" is mixed up with "instance
of". For example, Wikidata also says "Every 'House of Staufen' (Q130875)
is also a dynasty (Q164950)". This is dubious -- how many instances of
'House of Staufen' are there? I guess we really want to say that "The
House of Staufen is a(n instance of) dynasty." Is this a singular error
or a systematic issue?
I guess there is already a group of people who deal with such issues --
or it would be a miracle that things are in such a good shape already
:-) I have read the talk page for subclass of, but that does not seem to
explain the original of all the data we have already. Pointers?
Cheers,
Markus