If an identifier system provides for merging of entities along with the retention of both their previous IDs (as all good identifier systems which guarantee stable identifiers should), duplicate IDs are inevitable.  Well known examples include Freebase, MusicBrainz, OpenLibrary, and yes, even Wikipedia & Wikidata. Duplicates may be silently resolved as is the case with Freebase, redirects like OpenLibrary and Wiki*, or a hybrid like MusicBrainz (some page types redirect, others don't). Merged identities may be relatively rare (Freebase) or more common (OpenLibrary, MusicBrainz), but they'll always happen. Mandating uniqueness would force the "losing" IDs to be deleted from Wikidata, losing the benefit that they bring for enhancing and strengthening the mesh of identifiers. 

I've looked at the identifier list a couple of times with an eye towards helping with the curation, but I could never make heads nor tails of what the criteria were, whether there was consensus about the criteria, why some perfectly acceptably identifiers were being vehemently argued against and one what grounds, etc. The "community" driving this process on those wiki pages seems to be just a handful of vocal and opinionated people. Is that going to generate good results?

Tom

On Sun, Mar 6, 2016 at 4:17 AM, Markus Krötzsch <markus@semantic-mediawiki.org> wrote:
Another reason why "uniqueness" is not such a good criterion: it cannot be applied to decide the type of a newly created property (no statements, no uniqueness score). In general, the fewer statements there are for a property, the more likely they are to be unique. The criterion rewards data incompleteness (example: if Luca deletes the six multiple ids he mentioned, then the property could be converted -- and he could later add the statements again). If you think about it, it does not seem like a very good idea to make the datatype of a property depend on its current usage in Wikidata.

Markus


On 05.03.2016 17:15, Markus Krötzsch wrote:
Hi,

I agree with Egon that the uniqueness requirement is rather weird. What
it means is that a thing is only considered an "identifier" if it points
to a database that uses a similar granularity for modelling the world as
Wikidata. If the external database is more fine-grained than Wikidata
(several ids for one item), then it is not a valid "identifier",
according to the uniqueness idea. I wonder what good this may do. In
particular, anybody who cares about uniqueness can easily determine it
from the data without any property type that says this.

Markus


On 05.03.2016 15:35, Egon Willighagen wrote:
On Sat, Mar 5, 2016 at 3:25 PM, Lydia Pintscher
<Lydia.Pintscher@wikimedia.de> wrote:
On Sat, Mar 5, 2016 at 3:17 PM Egon Willighagen
<egon.willighagen@gmail.com>
What is the exact process? Do you just plan to wait longer to see if
anyone supports/contradicts my tagging? Should I get other Wikidata
users and contributors to back up my suggestion?

Add them to the list Katie linked if you think they should be
converted. We
wait a bit to see if anyone disagrees and I also do a quick sanity
check for
each property myself before conversion.

I am adding comments for now. I am also looking at the comments for
what it takes to be "identifier":

https://www.wikidata.org/wiki/User:Addshore/Identifiers#Characteristics_of_external_identifiers


What is the resolution in these? There are some strong, often
contradiction, opinions...

For example, the uniqueness requirement is interesting... if an
identifier must be unique for a single Wikidata entry, this is
effectively disqualifying most identifiers used in the life
sciences... simply because Wikidata rarely has the exact same concept
in Wikidata as it has in the remote database.

I'm sure we can give examples from any life science field, but
consider a gene: the concept of a gene in Wikidata is not like a gene
sequence in a DNA sequence database. Hence, an identifier from that
database could not be linked as "identifier" to that Wikidata entry.

Same for most identifiers for small organic compounds (like drugs,
metabolites, etc). I already commented on CAS (P231) and InChI (P234),
both are used as identifier, but none are unique to concepts used as
"types" in Wikidata. The CAS for formaldehyde and formaline is
identical. The InChI may be unique, but only of you strongly type the
definition of a chemical graph instead of a substance (as is now)...
etc.

So, in order to make a decision which chemical identifiers should be
marked as "identifier" type depends on resolution of those required
characteristics...

Can you please inform me about the state of those characteristics
(accepted or declined)?

Egon

Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata







_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata