Agreed. In Mix'n'match, 145 out of 177catalogs have at least one instance
of two or more external IDs matched to a single Wikidata item. External
datasets, even curated ones, are messy.
Maybe the criterion should be "intended to be unique", or somesuch.
On Sun, Mar 6, 2016 at 9:18 AM Markus Krötzsch <
markus(a)semantic-mediawiki.org> wrote:
Another reason why "uniqueness" is not such
a good criterion: it cannot
be applied to decide the type of a newly created property (no
statements, no uniqueness score). In general, the fewer statements there
are for a property, the more likely they are to be unique. The criterion
rewards data incompleteness (example: if Luca deletes the six multiple
ids he mentioned, then the property could be converted -- and he could
later add the statements again). If you think about it, it does not seem
like a very good idea to make the datatype of a property depend on its
current usage in Wikidata.
Markus
On 05.03.2016 17:15, Markus Krötzsch wrote:
Hi,
I agree with Egon that the uniqueness requirement is rather weird. What
it means is that a thing is only considered an "identifier" if it points
to a database that uses a similar granularity for modelling the world as
Wikidata. If the external database is more fine-grained than Wikidata
(several ids for one item), then it is not a valid "identifier",
according to the uniqueness idea. I wonder what good this may do. In
particular, anybody who cares about uniqueness can easily determine it
from the data without any property type that says this.
Markus
On 05.03.2016 15:35, Egon Willighagen wrote:
> On Sat, Mar 5, 2016 at 3:25 PM, Lydia Pintscher
> <Lydia.Pintscher(a)wikimedia.de> wrote:
>> On Sat, Mar 5, 2016 at 3:17 PM Egon Willighagen
>> <egon.willighagen(a)gmail.com>
>>> What is the exact process? Do you just plan to wait longer to see if
>>> anyone supports/contradicts my tagging? Should I get other Wikidata
>>> users and contributors to back up my suggestion?
>>
>> Add them to the list Katie linked if you think they should be
>> converted. We
>> wait a bit to see if anyone disagrees and I also do a quick sanity
>> check for
>> each property myself before conversion.
>
> I am adding comments for now. I am also looking at the comments for
> what it takes to be "identifier":
>
>
https://www.wikidata.org/wiki/User:Addshore/Identifiers#Characteristics_of_…
What is the resolution in these? There are some strong, often
contradiction, opinions...
For example, the uniqueness requirement is interesting... if an
identifier must be unique for a single Wikidata entry, this is
effectively disqualifying most identifiers used in the life
sciences... simply because Wikidata rarely has the exact same concept
in Wikidata as it has in the remote database.
I'm sure we can give examples from any life science field, but
consider a gene: the concept of a gene in Wikidata is not like a gene
sequence in a DNA sequence database. Hence, an identifier from that
database could not be linked as "identifier" to that Wikidata entry.
Same for most identifiers for small organic compounds (like drugs,
metabolites, etc). I already commented on CAS (P231) and InChI (P234),
both are used as identifier, but none are unique to concepts used as
"types" in Wikidata. The CAS for formaldehyde and formaline is
identical. The InChI may be unique, but only of you strongly type the
definition of a chemical graph instead of a substance (as is now)...
etc.
So, in order to make a decision which chemical identifiers should be
marked as "identifier" type depends on resolution of those required
characteristics...
Can you please inform me about the state of those characteristics
(accepted or declined)?
Egon
Cheers
Lydia
--
Lydia Pintscher -
http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata