Re: [Wikidata] Data model explanation and protection

28 Oct 2015

The below SPARQL counts 14.

Among them are https://www.wikidata.org/wiki/Q238509 which is "5-HT1A 
receptor human gene" in English and "5-HT₁A-Rezeptor Protein" in German. 
The last editor is ProteinBoxBot. It is coded by by itself. That item 
has a split personality, so it seems that we need to do some cleaning.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?item WHERE {
   ?item wdt:P352 ?uniprot ;
         wdt:P353 ?genesymbol .
   }

I now see that Teugnhausen has also merged 
https://www.wikidata.org/wiki/Special:Contributions/Teugnhausen

/Finn

On 10/28/2015 06:07 PM, Benjamin Good wrote:
...
  The Gene Wiki team is experiencing a problem that may
suggest some areas
 for improvement in the general wikidata experience.

 When our project was getting started, we had some fairly long public
 debates about how we should structure the data we wanted to load [1].
 These resulted in a data model that, we think, remains pretty much true
 to the semantics of the data, at the cost of distributing information
 about closely related things (genes, proteins, orthologs) across
 multiple, interlinked items.  Now, as long as these semantic links
 between the different item classes are maintained, this is working out
 great.  However, we are consistently seeing people merging items that
 our model needs to be distinct.  Most commonly, we see people merging
 items about genes with items about the protein product of the gene (e.g.
 [2]]).  This happens nearly every day - especially on items related to
 the more popular Wikipedia articles. (More examples [3])

 Merges like this, as well as other semantics-breaking edits, make it
 very challenging to build downstream apps (like the wikipedia infobox)
 that depend on having certain structures in place.  My question to the
 list is how to best protect the semantic models that span multiple
 entity types in wikidata?  Related to this, is there an opportunity for
 some consistent way of explaining these structures to the community when
 they exist?

 I guess the immediate solutions are to (1) write another bot that
 watches for model-breaking edits and reverts them and (2) to create an
 article on wikidata somewhere that succinctly explains the model and
 links back to the discussions that went into its creation.

 It seems that anyone that works beyond a single entity type is going to
 face the same kind of problems, so I'm posting this here in hopes that
 generalizable patterns (and perhaps even supporting code) can be
 realized by this community.

 [1]

https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#D…
 [2] https://www.wikidata.org/w/index.php?title=Q417782&oldid=262745370
 [3]
 https://s3.amazonaws.com/uploads.hipchat.com/25885/699742/rTrv5VgLm5yQg6z/m…

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

-- 
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Data model explanation and protection