Agreed. In Mix'n'match, 145 out of 177catalogs have at least one instance of two or more external IDs matched to a single Wikidata item. External datasets, even curated ones, are messy.
Maybe the criterion should be "intended to be unique", or somesuch.
On Sun, Mar 6, 2016 at 9:18 AM Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Another reason why "uniqueness" is not such a good criterion: it cannot be applied to decide the type of a newly created property (no statements, no uniqueness score). In general, the fewer statements there are for a property, the more likely they are to be unique. The criterion rewards data incompleteness (example: if Luca deletes the six multiple ids he mentioned, then the property could be converted -- and he could later add the statements again). If you think about it, it does not seem like a very good idea to make the datatype of a property depend on its current usage in Wikidata.
Markus
On 05.03.2016 17:15, Markus Krötzsch wrote:
Hi,
I agree with Egon that the uniqueness requirement is rather weird. What it means is that a thing is only considered an "identifier" if it points to a database that uses a similar granularity for modelling the world as Wikidata. If the external database is more fine-grained than Wikidata (several ids for one item), then it is not a valid "identifier", according to the uniqueness idea. I wonder what good this may do. In particular, anybody who cares about uniqueness can easily determine it from the data without any property type that says this.
Markus
On 05.03.2016 15:35, Egon Willighagen wrote:
On Sat, Mar 5, 2016 at 3:25 PM, Lydia Pintscher Lydia.Pintscher@wikimedia.de wrote:
On Sat, Mar 5, 2016 at 3:17 PM Egon Willighagen egon.willighagen@gmail.com
What is the exact process? Do you just plan to wait longer to see if anyone supports/contradicts my tagging? Should I get other Wikidata users and contributors to back up my suggestion?
Add them to the list Katie linked if you think they should be converted. We wait a bit to see if anyone disagrees and I also do a quick sanity check for each property myself before conversion.
I am adding comments for now. I am also looking at the comments for what it takes to be "identifier":
https://www.wikidata.org/wiki/User:Addshore/Identifiers#Characteristics_of_e...
What is the resolution in these? There are some strong, often contradiction, opinions...
For example, the uniqueness requirement is interesting... if an identifier must be unique for a single Wikidata entry, this is effectively disqualifying most identifiers used in the life sciences... simply because Wikidata rarely has the exact same concept in Wikidata as it has in the remote database.
I'm sure we can give examples from any life science field, but consider a gene: the concept of a gene in Wikidata is not like a gene sequence in a DNA sequence database. Hence, an identifier from that database could not be linked as "identifier" to that Wikidata entry.
Same for most identifiers for small organic compounds (like drugs, metabolites, etc). I already commented on CAS (P231) and InChI (P234), both are used as identifier, but none are unique to concepts used as "types" in Wikidata. The CAS for formaldehyde and formaline is identical. The InChI may be unique, but only of you strongly type the definition of a chemical graph instead of a substance (as is now)... etc.
So, in order to make a decision which chemical identifiers should be marked as "identifier" type depends on resolution of those required characteristics...
Can you please inform me about the state of those characteristics (accepted or declined)?
Egon
Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata