Agreed. In Mix'n'match, 145 out of 177catalogs have at least one instance of two or more external IDs matched to a single Wikidata item. External datasets, even curated ones, are messy.

Maybe the criterion should be "intended to be unique", or somesuch.

On Sun, Mar 6, 2016 at 9:18 AM Markus Krötzsch <markus@semantic-mediawiki.org> wrote:
Another reason why "uniqueness" is not such a good criterion: it cannot
be applied to decide the type of a newly created property (no
statements, no uniqueness score). In general, the fewer statements there
are for a property, the more likely they are to be unique. The criterion
rewards data incompleteness (example: if Luca deletes the six multiple
ids he mentioned, then the property could be converted -- and he could
later add the statements again). If you think about it, it does not seem
like a very good idea to make the datatype of a property depend on its
current usage in Wikidata.

Markus

On 05.03.2016 17:15, Markus Krötzsch wrote:
> Hi,
>
> I agree with Egon that the uniqueness requirement is rather weird. What
> it means is that a thing is only considered an "identifier" if it points
> to a database that uses a similar granularity for modelling the world as
> Wikidata. If the external database is more fine-grained than Wikidata
> (several ids for one item), then it is not a valid "identifier",
> according to the uniqueness idea. I wonder what good this may do. In
> particular, anybody who cares about uniqueness can easily determine it
> from the data without any property type that says this.
>
> Markus
>
>
> On 05.03.2016 15:35, Egon Willighagen wrote:
>> On Sat, Mar 5, 2016 at 3:25 PM, Lydia Pintscher
>> <Lydia.Pintscher@wikimedia.de> wrote:
>>> On Sat, Mar 5, 2016 at 3:17 PM Egon Willighagen
>>> <egon.willighagen@gmail.com>
>>>> What is the exact process? Do you just plan to wait longer to see if
>>>> anyone supports/contradicts my tagging? Should I get other Wikidata
>>>> users and contributors to back up my suggestion?
>>>
>>> Add them to the list Katie linked if you think they should be
>>> converted. We
>>> wait a bit to see if anyone disagrees and I also do a quick sanity
>>> check for
>>> each property myself before conversion.
>>
>> I am adding comments for now. I am also looking at the comments for
>> what it takes to be "identifier":
>>
>> https://www.wikidata.org/wiki/User:Addshore/Identifiers#Characteristics_of_external_identifiers
>>
>>
>> What is the resolution in these? There are some strong, often
>> contradiction, opinions...
>>
>> For example, the uniqueness requirement is interesting... if an
>> identifier must be unique for a single Wikidata entry, this is
>> effectively disqualifying most identifiers used in the life
>> sciences... simply because Wikidata rarely has the exact same concept
>> in Wikidata as it has in the remote database.
>>
>> I'm sure we can give examples from any life science field, but
>> consider a gene: the concept of a gene in Wikidata is not like a gene
>> sequence in a DNA sequence database. Hence, an identifier from that
>> database could not be linked as "identifier" to that Wikidata entry.
>>
>> Same for most identifiers for small organic compounds (like drugs,
>> metabolites, etc). I already commented on CAS (P231) and InChI (P234),
>> both are used as identifier, but none are unique to concepts used as
>> "types" in Wikidata. The CAS for formaldehyde and formaline is
>> identical. The InChI may be unique, but only of you strongly type the
>> definition of a chemical graph instead of a substance (as is now)...
>> etc.
>>
>> So, in order to make a decision which chemical identifiers should be
>> marked as "identifier" type depends on resolution of those required
>> characteristics...
>>
>> Can you please inform me about the state of those characteristics
>> (accepted or declined)?
>>
>> Egon
>>
>>> Cheers
>>> Lydia
>>> --
>>> Lydia Pintscher - http://about.me/lydia.pintscher
>>> Product Manager for Wikidata
>>>
>>> Wikimedia Deutschland e.V.
>>> Tempelhofer Ufer 23-24
>>> 10963 Berlin
>>> www.wikimedia.de
>>>
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>
>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>> unter
>>> der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
>>> Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>
>>> _______________________________________________
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>>
>


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata