Of course, Wikidata (by design) doesn't have formal typing of items; and it can be pretty domain-specific (and fluid) as to what aspects from different classes can or can't be combined on the same item.
So I think general checks that operate at the class/type level would be hard to specify. On the other hand identifying particular pairs which should not be merged should I think be comparatively easy to record, and comparatively easy to act on.
-- James.
On 28/10/2015 20:24, Tom Morris wrote:
On Wed, Oct 28, 2015 at 3:55 PM, Benjamin Good ben.mcgee.good@gmail.com wrote:
It sounds like Tom and James have basically the same idea for our particular problem, which I would support: enable a warning in the merge script when incompatible types are detected. These would have to be encoded somehow though - presumably in the property constraints.
I think they differ semantically in that one operates at the class/type level, while the other operates on pairs of instances (if I understand the property's semantics). Another more general check might be to see if the proposed merge will result in any property values, such as P688 encodes, which point to themselves. That's usually a sign of a structural problem.
Tom>>> For all languages except English, it's the protein Wikidata item
[1] that points to the corresponding Wikipedia page, while for Engish it's the gene item [2] that points to the corresponding English article [3].
I don't think that this is ubiquitously true, though it is true in many cases. This happened because the original imports from Wikipedia tagged the wikidata items about gene/proteins as proteins. We converted all the EN Wikilinks that we knew about programmatically but shied away from doing that for all the other languages.
Sorry, I didn't mean to imply that it was generally true, but rather true for the example I was looking at (Reelin [3]). Since the opening sentence begins "Reelin is a large secreted extracellular matrix glycoprotein ...," I'd say that the article is about a protein [1], yet it's linked to a gene [2]. For articles which are about multiple Wikidata items, I guess another possible answer is that they shouldn't be linked to anything item (or perhaps all related items if that's technically possible).
[1] https://www.wikidata.org/wiki/Q13561329 [2] https://www.wikidata.org/wiki/Q414043 [3] https://en.wikipedia.org/wiki/Reelin
Tom
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata