Hi,
I got interested in subclass of (P279) and instance of (P31) statements recently. I was surprised by two things:
(1) There are quite a lot of subclass of statements: tenth of thousands. (2) Many of them make a lot of sense, and (in particular) are not (obvious) copies of Wikipedia categories.
My big question is: who is creating all these statements and how is this done? It seems too much data to be created manually, but I don't see obvious automated approaches either (and there are usually no references given).
I also found some rare issues. "A subclass of B" should be read as "Every A is also a B". For example, we have "Every piano (Q5994) is also a keyboard instrument (Q52954)". Overall, the great majority of cases I looked at had remarkably sane modelling (which reinforces my big question).
But there are still cases where "subclass of" is mixed up with "instance of". For example, Wikidata also says "Every 'House of Staufen' (Q130875) is also a dynasty (Q164950)". This is dubious -- how many instances of 'House of Staufen' are there? I guess we really want to say that "The House of Staufen is a(n instance of) dynasty." Is this a singular error or a systematic issue?
I guess there is already a group of people who deal with such issues -- or it would be a miracle that things are in such a good shape already :-) I have read the talk page for subclass of, but that does not seem to explain the original of all the data we have already. Pointers?
Cheers,
Markus