Yes, I think the problem of maintaining a multi-class data model within wikidata is a general problem.  You could imagine similar scenarios in any domain.

Our particular gene/protein merge problem is specific to our work.  It is not just one user (Fullerene) though, this has been happening for a while and many have participated.  See e.g. the post here: https://www.wikidata.org/wiki/User_talk:Andrawaag#ProteinBoxBot_Mistake.3F
and here:
https://www.wikidata.org/wiki/User_talk:DGtal#Merging_items

On Wed, Oct 28, 2015 at 10:47 AM, Finn Årup Nielsen <fn@imm.dtu.dk> wrote:
Do you think it is a general problem? The few merges that I checked was all done by Fullerene and s/he has now responded after Andrawaag made a note on the talk page https://www.wikidata.org/wiki/User_talk:Fullerene


/Finn


On 10/28/2015 06:07 PM, Benjamin Good wrote:
The Gene Wiki team is experiencing a problem that may suggest some areas
for improvement in the general wikidata experience.

When our project was getting started, we had some fairly long public
debates about how we should structure the data we wanted to load [1].
These resulted in a data model that, we think, remains pretty much true
to the semantics of the data, at the cost of distributing information
about closely related things (genes, proteins, orthologs) across
multiple, interlinked items.  Now, as long as these semantic links
between the different item classes are maintained, this is working out
great.  However, we are consistently seeing people merging items that
our model needs to be distinct.  Most commonly, we see people merging
items about genes with items about the protein product of the gene (e.g.
[2]]).  This happens nearly every day - especially on items related to
the more popular Wikipedia articles. (More examples [3])

Merges like this, as well as other semantics-breaking edits, make it
very challenging to build downstream apps (like the wikipedia infobox)
that depend on having certain structures in place.  My question to the
list is how to best protect the semantic models that span multiple
entity types in wikidata?  Related to this, is there an opportunity for
some consistent way of explaining these structures to the community when
they exist?

I guess the immediate solutions are to (1) write another bot that
watches for model-breaking edits and reverts them and (2) to create an
article on wikidata somewhere that succinctly explains the model and
links back to the discussions that went into its creation.

It seems that anyone that works beyond a single entity type is going to
face the same kind of problems, so I'm posting this here in hopes that
generalizable patterns (and perhaps even supporting code) can be
realized by this community.

[1]
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Distinguishing_between_genes_and_proteins
[2] https://www.wikidata.org/w/index.php?title=Q417782&oldid=262745370
[3]
https://s3.amazonaws.com/uploads.hipchat.com/25885/699742/rTrv5VgLm5yQg6z/mergelist.txt


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata