On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such as MusicBrainz
artist ID <https://www.wikidata.org/wiki/Property:P434> etc. and whether
Wikidata items should show all of them or 'preferred' ones only as we
did for VIAF redirects
<https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38>.
There are also other cases where external sites have duplicates that are not reconciled (yet). For example, Q46843 has multiple GeoNames Ids:
http://sws.geonames.org/7602447
http://sws.geonames.org/2954602
The second was suggested by Freebase, the first is what Wikipedia had. I think the first is better (polygon rather than bounding box), so I made this preferred. This is a situation where we should keep multiple identifiers, since the external database really has two ids that are not integrated yet.
Now if the external site reconciles the ids, we have these options:
(1) Keep everything as is (one main id marked as "preferred")
(2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used)
(3) Delete the redirect ids
I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.
Is there any benefit in removing old ids completely? I guess constraint reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...). Other than this, I don't see a big reason to spend time on removing some ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.
Markus