I tend to agree with Jerven. He is right to say that URIs work best as identifiers. However, some things should still be kept in mind:
* The strings we are talking about are in fact IDs and not ambiguous: no string id identifies multiple objects. * The problem is in finding the right web page to refer a user to for each ID. URIs are often distinct from the URLs that users would like to read. It is even possible that there are already official URIs for some of the datasets we were talking about, and that these URIs do not help us in finding the right URL either.
In some datasets, the problem might be solved by switching to URIs, but this requires a working content negotiation to redirect users when they open the URI in their browser. I have some doubts that we can find this for the problematic cases, given that they don't even have a simple redirection service for finding their URLs.
Moreover, there is the technical problem that the design that has been selected for distinguishing external IDs in Wikidata is such that these IDs must be of type string.
In a perfect world, Jerven's approach would still be the cleanest, I believe, but it might be impractical at the moment.
Cheers,
Markus
On 29.04.2016 15:29, Jerven Tjalling Bolleman wrote:
Could I be so bold to suggest that in Wikidata we should strive to use external URI's for identifiers not Strings.
For example in Wikidata, there are a lot of UniProt accessions. e.g. behind the property https://www.wikidata.org/wiki/P352 and there is a formatter for a URL.
I think this is the wrong way round, there should be an URL/URI there and a formatter to generate a local string for display purposes.
And of course for chembl the URL/URI to use would be
<http://rdf.ebi.ac.uk/resource/chembl/molecule/CHEMBL101690?There a 2 advantages to this. It allows easier federates queries from the source databases into wikidata (no URI conversions etc..) The second is that these URIs are clearly not ambiguous.
Regards, Jerven
On 28/04/16 23:49, Julie McMurry wrote:
"One should also point out to the authorities maintaining these IDs
that they should spend some effort on producing a workable solution for this. It seems they should be the first to provide a resolver service (or maybe it would be an "ID search engine" if it is so complicated).
With the qualifiers in place, Wikidata can also be used to achieve this, of course, but it seems we are just manually reverse engineering something that should be done at the site of whoever is controlling the ID registration."
Well said, Markus. A most hearty agreement here on my side and one colleagues and I have been trying to raise awareness of for a long time now (http://bit.ly/id-guidance). One of the challenges is that databases are already being asked to do more with less. They can see the utility of such a service to others, but when I've asked DBs before (not naming names), traction has been limp (I've yet to ask Chembl). Sometimes it works out though. For instance, KEGG used to have 12 different type-specific URLs, corresponding to:
kegg.compound kegg.disease kegg.drug kegg.environ kegg.genes kegg.genome kegg.glycan kegg.metagenome kegg.module kegg.orthology kegg.pathway kegg.reaction
Thankfully, they've collapsed those to a single URL pattern.
The databases that find it the toughest are not those who simply don't embed typing, but rather those that don't embed typing AND ALSO have local identifiers that would otherwise collide. For instance, a prominent bio database is in this boat (not naming names) and would like to make things better but it is hard and messy due to the collisions.
FYI 345 of the 560+ records in the identifiers.org http://identifiers.org corpus are type-specific at the level of identifiers.org http://identifiers.org's namespace; these roll up to ~300 providers.
The question though is what WikiData is trying to accomplish. Say you encounter the chembl ID CHEMBL308052 http://linkedchemistry.info/chembl/chemblid/CHEMBL308052 do you need to retrieve the type of the entity for reasons other than determining what URL to use?
How are you representing entity labels / IDs to users?
Best, Julie
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata